Cloud Outages
November 21, 2025
October 2025 saw two large cloud outages. On the 20th, an AWS outage in US-East-1 took down a large portion of the internet, and took almost 24 hours to recover. Then on the 29th, an Azure outage had similar impact.
Microsoft and Amazon have both released their post-mortems. In both cases it was a DNS issue, which has become something of a meme in the tech world.
At the time of the outages, and even somewhat after the press releases, there was a lot of blame being placed at the door of AI.
Then, this week Cloudflare suffered an outage for several hours, again taking out a large portion of the internet. The great thing about Cloudflare is that they always post very detailed post-mortems about what caused the outage - but again that didn’t stop people from speculating that AI was the cause.
At this point it’s possibly worth saying that I’m not convinced about the use of AI currently when coding. I’m not a total skeptic. I’ve seen first-hand in some smaller projects I have worked on that it can be very useful and a great speed boost.
But in my day to day legacy code base it has often struggled to help - likely due to a lack of training data on the domains and frameworks I am using. It seems very language and domain dependent currently.
Despite that, I don’t think AI is the cause behind these outages - and some people seem to agree.
…
I am not convinced it’s ai, I think it’s just accelerating this trend
…
So if it’s not AI, what do I think it is? Well the one thing I haven’t seen called out too much amongst all of this is the consolidation of the internet on a few providers.
If we look at the cloud providers above, Cloudflare is estimated to provide services to around 25% of all websites. Exact figures aren’t available for Azure or AWS, and would probably be very difficult to calculate but they’re highly popular and supply major companies all over the globe.
The problem, to me at least, is that a single outage affects a huge portion of the web due to this reliance on a small number of providers. If we go back 10-15 years though, the web landscape was very different.
There were huge numbers of independent hosting providers, or people self-hosting their websites and services. If one small provider had an outage, or the web server in your bedroom went down, a lot less was affected. The blast radius for an outage was so much smaller.
In the not too distant past when starting a business you had huge decisions to make around hosting and infra. Now though, the convenience of being able to Pay-as-you-Go without much initial investment means more companies simply buy into the large cloud providers.
In some cases it’s critical for investment. As a growing business, being hosted on one of the main providers is a necessity to secure funding.
If we look beyond hosting, we can all remember the relatively recent Crowdstrike outage that took down millions of systems globally. Why did it have such widespread impact? Because when you get asked on an audit about security, you say “we use Crowdstrike”, a box gets ticked and the auditor moves on.
We have traded independence for convenience at every possible opportunity - and it may continue to get worse. Some providers that may seem to offer independence, are actually just wrappers around the bigger services.
So where do we go from here?
Do we need to start evaluating our cloud hosting options? People like DHH are advocates for moving away from the larger providers and running their own infrastructure. Basecamp estimate that moving back to on-prem will save them millions over the next few years.
For a small business or a startup though, that might not be viable due to the initial time and money investment to get set up. The cloud is perfect for these scenarios, but perhaps having an exit strategy planned down the line is worth considering?
Alternatively, perhaps more businesses should be considering a multi-cloud setup? Fairly recently, an Australian pension provider had their Google Cloud account deleted accidentally. That would have been catastrophic if not for the backups stored with another provider.
I think that was probably a difficult sell to the business and shareholders though. In my corporate experience, convincing people to use multiple cloud providers “just in case” feels like it would be practically impossible.
One thing seems almost certain to me. Until we start to prioritise resilience over cost and convenience, these outages will become increasingly common.