Is Your Product Ready for Christmas Morning?

Al Lindsay
Jul 29
5 min read

How to Handle Hyper-Scale from a Former Alexa VP

Christmas time…the year is coming to a close, things are slowing down, and almost everyone is on some sort of break. Not consumer electronics teams, though. They aren’t resting. When you are about to have a massive surge of new customers booting up your product all at the same time, you aren’t relaxing.

You are waiting with bated breath, hoping that everything you did to prepare for this moment will pay off.

Hyperscale in an Instant

Working at Alexa, we scaled fast. Really fast. We went from zero to a hundred million endpoints in a few short years and were selling devices left and right. This pace of growth lasted for years, and we adapted.

What was always a shock, though, were the user surges at certain points of the year. Some, like Christmas, were expected. Others were not. We had to build ways to handle massive influxes of users in order to avoid disappointing customers with a crashed system.

Christmas Eve in the War Room

The first step to building a system that could handle these user surges was, of course, preparation. We built, tested, rebuilt, and tested again until we saw evidence that the network was robust enough to handle more users than we expected to get in peak moments, and then we froze the code.

In theory.

Leading up to Christmas (when hundreds of thousands, if not millions, of new Alexa devices would be fired up), we tried to have total code freezes so that we knew everything would work on Christmas morning. But, inevitably, some team would insist their business absolutely needed a last-minute change and would throw a wrench into our whole operation.

So, every Christmas Eve, a team of leaders, engineers, and operations experts stayed up all night, monitoring dashboards, logs, and alerts, and ensuring the Grinch didn’t steal anyone’s Christmas. This started with Christmas Eve in Germany (when presents are traditionally opened there) and didn’t end until after noon on the West coast of America (more than 24 hours later). Everything needed to be in working order when new customers tried the product.

That same mindset—designing for surges—guided how we handled the unexpected, too.

We Crashed Our Own Device

While we were able to anticipate and prepare for certain peak events like Christmas morning, some surges came as a surprise. One time, our very own Super Bowl ad triggered what was effectively a self-directed denial-of-service event.

The commercial included someone saying “Alexa,” and that one word set off a chain reaction. Millions of devices heard it from the TVs in homes, and they all tried to respond at once. Our normal peak traffic had been around 1,000 threads per second, and the ad pushed that up to 17,000.

Luckily, because this mass “wake up” event wasn’t actually a result of users trying to use their devices, the outage went largely unnoticed and lasted less than a minute. Everyone was still watching the Super Bowl by the time it was resolved.

For users who may have actually been using Alexa at this time, what saved us was the circuit breaker: a mechanism we had put in place to shed load intentionally. It let us prioritize delivering a solid experience to the majority of active users, even if it meant dropping some requests outright. The result is that the Alexa device doesn’t respond, so most users just assume that Alexa hadn’t heard them. They try again, and Alexa picks up their request the second time.

No harm, no foul.

However, we knew it wasn’t sustainable to rely on a reactive system like the circuit breaker to solve surge problems caused by our own advertising, so we built a fix.

The first solution was to distort the “wake-up word" in future ads so the devices wouldn’t hear “Alexa” clearly enough to respond. That worked, but we improved on it further.

We started adding an audio watermark to our media. This was a sub-audible signal embedded in the ad that told the device: “ignore this wake up call.” It wasn’t audible to the viewer and didn’t affect the audio quality of the ad, but Alexa knew not to respond.

This watermark process ran entirely on-device. No cloud round-trips. No latency. Just a local, fail-safe guardrail.

That worked until we weren’t the ones surging Alexa.

Jimmy Kimmel Cyberattacked Us

Jimmy Kimmel ran a segment where he deliberately triggered Alexa commands on-air, telling Alexa to order 500 pool noodles (clip starts at 6:17). Since it wasn’t our media, there was no watermark. And unlike the Super Bowl ad, he wasn’t just waking up the devices, he was trying to make them shop..

He succeeded in waking up a lot of devices, but he hit a wall with the shopping. Shopping through Alexa typically has a PIN and a variety of confirmations, so Kimmel’s on-air order to all of the Alexas in America that heard him didn’t really have the impact he was hoping for.

But, this event prompted us to build another layer of protection.

We built a distributed detection system. If multiple audio streams came into our cloud that sounded identical—same cadence, same noise profile—we’d assume it was a broadcast and drop the request across devices. This was especially hard in a horizontally scaled system, where different requests hit different servers, but we got it working well enough to catch most cases.

Constant Iteration

The point of these stories is that at hyperscale, there is no playbook. No one can really tell you how to build a system that simultaneously serves millions. You build something, you learn, you respond, and you adapt. You get smarter after each failure or close call and you work every fix into the baseline.

What any scaling company needs to understand is how to break, learn, and grow.

Though I can’t tell you exactly how to build for hyperscale, I can tell you how NOT to do it.

Don’t Use Band-Aids; Fail Gracefully

Scaling companies sometimes try to outrun their problems by buying more server space. This works for a little while, but it is incredibly inefficient. It is a band-aid, not a cure.

Since there is no way to fully avoid an overwhelmed system, one thing to do is to build failure into your system. Queue requests, defer actions, and give the customer a notification instead of a spinning wheel. Drop requests to keep the rest of the system working.

Perfection is not an option, and the alternative to failing gracefully is worse: everyone gets a slow, broken experience. Mature systems break, but they don’t collapse.

Amplification

As a final piece of advice for scaling businesses, remember that everything amplifies in scaled code. Things become incredibly fragile. One loop, one recursive call, or one poorly timed refactor can have a huge impact. One millisecond of latency in a core service can add tens of millions in compute costs. We used to say compute was getting cheap, but at hyperscale, inefficiency is incredibly expensive.

The Takeaway

You don’t need a billion users for this stuff to matter. If you’re shipping a new product or anticipating user surges from a commercial, a special event, or anything else, your “Christmas morning” is coming.

Things will surge at some point, and your system and your team need to be ready for it.

If you never get a surge of users, well, that’s a different kind of problem.

Is Your Product Ready for Christmas Morning?

How to Handle Hyper-Scale from a Former Alexa VP

Hyperscale in an Instant

Christmas Eve in the War Room

We Crashed Our Own Device

Jimmy Kimmel Cyberattacked Us

Constant Iteration

Don’t Use Band-Aids; Fail Gracefully

Amplification

The Takeaway

Recent Posts

Technology problems are the hardest problems.
We help you fix them.

How to Handle Hyper-Scale from a Former Alexa VP

Hyperscale in an Instant

Christmas Eve in the War Room

We Crashed Our Own Device

Jimmy Kimmel Cyberattacked Us

Constant Iteration

Don’t Use Band-Aids; Fail Gracefully

Amplification

The Takeaway

Technology problems are the hardest problems. We help you fix them.

Technology problems are the hardest problems.
We help you fix them.