Global Internet Disrupted for Over Half a Day Due to Major Amazon Cloud Outage
The world's internet was disrupted for more than half a day due to a major outage of Amazon's cloud services. Amazon Web Services (AWS), the leading cloud platform globally, announced that it had resolved the lengthy outage that affected numerous online applications on Monday, revealing a vulnerability in the global computing infrastructure, which heavily relies on American tech giants.
From banks to online games (Fortnite, Roblox), streaming platforms (Disney+, Prime Video), and everyday applications (Airbnb, Zoom, Snapchat), a significant portion of online services faced disruptions or even complete outages after a localized failure occurred shortly after 7:00 AM GMT in AWS's historical data centers near Washington.
Just before 10:00 PM GMT, after about fifteen hours of crisis management, the owner of this backbone of online computing announced that the outage was completely resolved and expected "full recovery within the next two hours," as they worked through the backlog.
This failure, which led to blocked payments, interrupted deliveries, and other hindrances to professional and personal activities, illustrated the world's dependence on the infrastructures of American tech giants.
As a subsidiary of Amazon, AWS is the largest global provider of cloud computing, offering shared data centers, private servers, and artificial intelligence (AI) tools to businesses. It accounts for nearly a third of the global market in this rapidly growing sector, driven by the swift rise of AI, outpacing its American competitors, Microsoft Azure and Google Cloud, who share the remaining third, according to Synergy Research Group.
This outage "highlights the challenges associated with dependence" on foreign service providers like Amazon, Microsoft, and Alphabet (Google), which cater to a significant portion of customers worldwide, stated Junade Ali, a cybersecurity expert at the Institution of Engineering and Technology (IET) in the UK.
It raises "serious questions" about the appropriateness for companies to "outsource all or part of their essential infrastructure to a small group of third-party suppliers to save on hosting costs," noted British financial analyst Michael Hewson.
"This excessive dependency on a single provider now threatens more than just service availability; it jeopardizes brand reputation and customer trust," emphasized Gadjo Sevilla, an analyst at Emarketer, highlighting the need for AWS clients to develop redundancy strategies, which come with additional financial and energy costs.
Several hours after the incident began, AWS indicated that the "likely cause" of the outage was a DNS issue, the domain name system that directs internet requests to the correct destination.
However, in the following hours, after a brief calm, difficulties resumed, affecting new victims in the United States around 3:00 PM GMT, including the game Battlefield, Delta Airlines, and the popular online payment service Venmo.
AWS then announced it had identified the more serious origin of the incident: "the root cause is an internal subsystem responsible for monitoring the proper functioning of the network load balancers."
In other words, the failure, whose explanation is still unknown, not only concerns the navigation system but also the control tower of the system.
To prevent an outage from affecting the entire network, AWS had divided the world into about forty regions, each with three distinct and isolated structures that could compensate for the failure of one or the other.
However, Monday's incident demonstrated that a number of fundamental requests continue to pass through the data centers of the US-East-1 region, AWS's oldest (2006) and most important center, located in northern Virginia.
In July 2024, another IT outage linked to an update of cybersecurity group CrowdStrike's software on Windows had paralyzed airports, hospitals, and many other organizations, causing massive chaos worldwide.
According to Microsoft, this software outage, rather than an infrastructure issue, affected approximately 8.5 million devices, with users facing "blue screens of death" that made rebooting impossible.
AFP