On December 7, 2021, an AWS internal-network event in us-east-1 took down most of the consumer internet for about four hours. Disney+ stopped streaming. Slack stopped messaging. Coinbase stopped trading. Netflix stayed up because it had spent a decade engineering for exactly this. The point of an outage like that isn't that "AWS went down" — outages happen — it's that everyone went down together, and the average user had no idea those services shared a single Virginia data-center region.
This post is the dependency map. Skim the table, skip to "how to read an AWS outage" if there's one happening as you read, and bookmark /infra/aws — that's where we keep the live cascade view.
The us-east-1 problem ¶
AWS has 30+ regions. One of them — us-east-1, in Northern Virginia — runs an outsized share of every service AWS hosts, for a few reasons that compound:
- It's the oldest region. Original launch in 2006. Companies that picked AWS early put their primary stacks there and never moved them, because moving petabytes of data is expensive and migrating active state is harder.
- It's the cheapest region. Per-hour EC2 pricing in Virginia consistently undercuts other regions by 5-15%; for a startup, that adds up to real money.
- It hosts the AWS control plane. Several global AWS services route through us-east-1 internally — IAM, Route 53 health checks, parts of the billing pipeline, the legacy STS endpoint. Even an app that runs in Frankfurt can fail if a Virginia control-plane component is having a bad day.
The third reason is the underrated one. A 2017 S3 outage took down sites that didn't think they used S3 — turned out their CI/CD pipelines or their image hosts did, and "we run in Ireland" wasn't enough.
What runs on AWS — the consumer-side view ¶
Here's the slice that's relevant to "I just want to know if my Netflix is going to work tonight." We curate this list from public statements, official tech-stack pages, and AWS case studies — never from guesses. There are thousands of AWS-hosted services we don't track; the ones below are the ones that get search-volume during outages.
| Service | What's on AWS | Cascade impact |
|---|---|---|
| Netflix | Almost everything except CDN edges (Open Connect, their own boxes inside ISPs). Famously runs across multiple AWS regions; can survive single-region outages because they engineered for it. | Streaming usually keeps working — you may see a slow homepage or buffering, but not a full outage. |
| Disney+ / Hulu | Disney Streaming runs primarily on AWS. Streamlining onto Disney's "BAM" tech (acquired from MLB) consolidated infra in us-east-1. | Goes down with us-east-1. December 2021 took both offline for hours. |
| Twitch | Owned by Amazon — runs entirely on AWS. Live ingest sits in regional clusters; chat in us-east-1. | Chat fails first when us-east-1 hiccups, often before video does. |
| Web + API on AWS. Static assets via Fastly. Image hosting (i.redd.it) on AWS. | Goes down with AWS. The mobile app falls back to a "we're having trouble" screen. | |
| Slack | AWS — Slack's status page mentions it directly. Multi-region but heavy in us-east-1. | Connections drop, message delivery degrades. Reconnects often surge once AWS recovers and produce a thundering-herd lag of their own. |
| Coinbase / Robinhood | Both on AWS. Trading + matching engines colocated for low latency. | Trading halts during AWS incidents — the worst kind of outage to have during a market spike. |
| Notion | AWS. Heavy on RDS / Aurora. | Read-only mode kicks in first; if the database write tier is degraded, every keystroke fails. |
| Airbnb / DoorDash / Lyft | All AWS. Travel + delivery + rideshare routing run on EC2 + Lambda. | The visible failure is "I can't book / order / call a car"; the cause is one tier deeper. |
| Mostly AWS for serving + storage. Some workloads on GCP. | Image-loading degrades first; full app outage during severe AWS events. |
Live status of every service in this table sits at /infra/aws. If you landed here during a real AWS outage, that's the page you actually want — it sorts down/degraded services to the top so you can see the cascade in real time.
How to read an AWS outage ¶
The next AWS incident will happen. When it does, the same five questions answer most of "is my service affected?":
1. Is it us-east-1, or somewhere else?
Open health.aws.amazon.com. The Status Dashboard lists per-region per-service incidents. If the colored markers are clustered in us-east-1, you're seeing the canonical cascade. If they're in Tokyo or São Paulo, the consumer-services impact is much narrower — most US/EU services don't run primary infra there.
2. Is the dashboard itself slow to load?
It runs on AWS. During severe events, the status page itself updates with delays — sometimes hours. AWS publishes a "Service Health Dashboard" RSS feed at the same URL that's slightly faster to update because it bypasses the rendered page. The 2021 outage notoriously delayed the dashboard's own updates by ~90 minutes; it's a recurring pattern.
3. Which AWS service is the proximate cause?
The cascades have fingerprints:
- EC2 / EBS: the broadest cascade. Anything compute-related fails. Recovery is slow because EC2 instances need to come back up in dependency order.
- Lambda: serverless functions return 5xx. APIs that route through API Gateway → Lambda fail; the underlying RDS database might be fine.
- S3 / DynamoDB: read-only and stale-content failure modes. Sites stay reachable but content disappears or won't update.
- IAM / STS: intra-AWS auth fails. Internal AWS services that need to talk to each other lose the ability to do so. Looks like everything is sad without an obvious cause.
- Route 53: DNS resolution for AWS-hosted domains breaks. The site's "down for everyone" but its IP is fine; the name → IP step is broken.
4. Are all of those listed services down for me?
If yes, it's a region-wide event and you can stop testing your own connection. If only some, the issue is partial — different services use different AWS subsystems and some of those are still healthy. Our /infra/aws page sorts down-first, so the cascade map is right there.
5. What can I do?
If you're an end user: nothing. AWS outages don't have a client-side fix. Local DNS flushing won't help; switching to mobile data won't help. Wait for AWS to recover, and ignore Twitter/X complaints that say "Reddit/Slack/etc. is down" because those are downstream of the actual problem and your tweet isn't speeding it up.
If you're an operator running on AWS: your runbook should already include "if us-east-1 is degraded, fail over to us-west-2 / eu-west-1." If it doesn't, the December 2021 retro is mandatory reading.
What's not on AWS (a partial list) ¶
Knowing what isn't on AWS is sometimes more useful than knowing what is. Specifically these don't share AWS's blast radius:
- GitHub. Microsoft Azure, since the acquisition. GitHub had its own famous outage on October 21, 2018; it had nothing to do with AWS.
- Spotify. Mostly Google Cloud (GCP). Migrated off AWS over 2016-2018. Spotify outages and AWS outages are independent.
- Discord. GCP for compute, Cloudflare for the edge. Famously not on AWS.
- YouTube + Gmail + everything Google. Obviously Google's own infra (Borg / Spanner / Colossus).
- X (Twitter). Mix of AWS, Google Cloud, and on-prem since the 2022 cost cuts. Less concentrated than it used to be.
- Apple services (iCloud, App Store). Apple owns its data centers + uses GCP + uses AWS for some workloads. Distributed enough that no single cloud takes them all down.
Why this matters for your status checking ¶
Most "is X down?" tools answer the wrong question. They tell you that Reddit specifically is down, and the user is left to figure out whether to wait, complain to Reddit, complain to their ISP, or restart their phone. The right question — when several big services are affected at once — is "is this a single upstream provider, and if so, which one?"
That's why /infra exists on isitdown.io. We tag every catalog service with the cloud / CDN it publicly runs on, so when a cascade is in progress you can see it as one event instead of 13 separate ones. Tagged conservatively from public statements only, because being wrong during a real outage is worse than being incomplete.
FAQ ¶
Is AWS more or less reliable than competitors?
It depends on what you measure. AWS has more major-region-wide outages per year than GCP — partly because it has more regions and bigger blast radius per region; partly because us-east-1 is genuinely overloaded with control-plane responsibility. But AWS publishes detailed post-mortems and has the longest track record of any cloud, which makes the failures more visible.
Can I check if a specific site uses AWS without insider info?
Sometimes. dig +short site.com + an IP-range lookup against AWS's published ip-ranges.json will catch direct EC2 and ELB usage. Cloudflare-fronted sites hide their origin so the answer often reads "we don't know" — and that's correct, not a tool failure.
Why don't you list every AWS-hosted service?
Because we'd be wrong about most of them. Public companies disclose their cloud provider in earnings calls, status pages, and conference talks; private startups usually don't. The 13 services we currently tag (Netflix, Reddit, Slack, Disney+, etc.) are all from public sources. We'd rather show 13 we're certain of than 200 half-guessed.