The Internet started the week the way many of us wanted: by refusing to go to work. An outage with Amazon Web Services left large swaths of the Internet unavailable Monday. Sites and services including Snapchat, Fortnite, Venmo, PlayStation Network and, predictably, Amazon were consistently unavailable from the beginning of the day.
The outage began shortly after midnight PT and took about three and a half hours for Amazon to fully recover. Social networks and streaming services were among more than 2,000 companies affected, and vital services such as online banking were also shut down.
As of 12:15 p.m. PT, Amazon said it continues to see improvements across all AWS services. The company said that customers who use it AWS LambdaA compute service that runs code without the need to manage a server, “may encounter intermittent function errors making network requests to other services or systems while we work to address residual network connectivity issues.”
The company said it will issue another update at 1 pm PT.
Outage Schedule
The issues seemed to be largely resolved as the US East Coast came online, but increased dramatically again after 8 a.m. PT as work began on the West Coast. It’s possible that this happened because West Coasters were simply adding to the reports, or that they became worse as more people tried to access the system.
AWS, the cloud service provider owned by Amazon, supports large parts of the Internet. So when it went down, it took with it many of the services we know and love. as with fast And mob attack Outages Over the past few years, AWS outages have shown how much of the Internet depends on a single infrastructure – and how quickly our access to the sites and services we rely on can be revoked if something goes wrong.
Dependence on a small number of large companies to make the web work is like putting all our eggs in one small basket. When it works, it’s great, but all it takes is one small thing to go wrong for the Internet to go down within minutes.
How widespread was the AWS outage?
Just after midnight PT on October 20, AWS first reported an issue on service status pageSaying it was “investigating increased error rates and latency for multiple AWS services in the US-East-1 region.” At around 2 a.m. PT, it said it had identified the probable root cause of the problem. Within half an hour, it had begun implementing mitigations resulting in significant signs of improvement.
“The underlying DNS issue has been fully mitigated, and most AWS service operations are now succeeding normally,” AWS said at 3:35 a.m. PT.
Amazon did not respond to a request for further comment other than to point us back to the AWS Health dashboard.
But as of 8:43 a.m. PT, many services were still affected, and the AWS status page showed the severity as “Degraded.” In a post at the time, AWS wrote: “We are reducing requests for new EC2 instance launches to aid in recovery and are actively working on mitigation.”
The AWS outage in the US first peaked before dawn Monday, then subsided and increased again around noon.
Around the same time that AWS says it first began noticing the error rate rising, outage-tracking site DownDetector noticed that reports had begun to increase across a number of online services, including banks, airlines and phone carriers. As AWS resolved the issue, some of these reports saw a decline, while others have still not returned to normal. (DownDetector is owned by CNET, the parent company of Ziff Davis.)
At around 4 a.m. PT, Reddit was still down, while services including Ring, Verizon and YouTube were still seeing a large number of reported problems. Reddit finally came back online around 4.30 a.m. PT, according to its status page, which was verified by CNET.
In total, DownDetector saw more than 9.8 million reports, of which 2.7 million were from the US, more than 1.1 million from the UK and the rest largely spread across Australia, Japan, the Netherlands, Germany and France. DownDetector said more than 2,000 companies were affected in total, while about 280 companies were still experiencing problems around 10 a.m. PT.
“This kind of disruption, where a basic Internet service disrupts a large number of online services, happens only a few times a year,” Daniel Ramirez, DownDetector product director at Ookla, told CNET. “They’re probably becoming a little more frequent as companies are encouraged to rely solely on cloud services and have their data architectures designed to get the most out of a particular cloud platform.”
What causes AWS outage?
AWS did not immediately share full details about what caused the Internet outage this morning. Then at 8:43 a.m. PT offered this brief explanation: “The root cause is an underlying internal subsystem that is responsible for monitoring the health of our network load balancers.”
Earlier in the day it had attributed the outage to a “DNS issue”. DNS stands for Domain Name System and refers to the service that translates human-readable Internet addresses (for example, CNET.com) into machine-readable IP addresses that connect browsers to websites.
According to Downdetector, internet outages were reported at several sites on Monday morning, causing internet disruptions.
When a DNS error occurs, the translation process cannot occur, disrupting the connection. DNS errors are common Internet disruptions, but usually occur on a smaller scale, affecting individual sites or services. Because AWS usage is so widespread, a DNS error can have equally wide-ranging consequences.
According to Amazon, the issue is geographically contained in its US-East-1 region, which refers to an area in Northern Virginia where many of its data centers Are based. It is an important location for Amazon as well as many other internet companies, and it offers services that extend to the US and Europe.
“The lesson here is flexibility,” said Luke Kehoe, industry analyst at Ookla. “Many organizations still concentrate critical workloads in a single cloud region. Distributing critical apps and data across multiple regions and availability zones can significantly reduce the scope of a future incident explosion.”
Was the AWS outage caused by a cyber attack?
DNS issues could be caused by malicious actors, but there is no evidence at this stage to say that this is a case of an AWS outage.
However, technical flaws can pave the way for hackers to seek out and exploit vulnerabilities when companies’ backs are turned and security is let down, according to Marijus Briedis, CTO. nordvpn,
“This is a cyber security as well as a technical issue,” he said in a statement. “True online security is not just about keeping hackers out, but also about ensuring you can stay connected and protected when systems fail.”
In the coming hours, people should be wary of scammers hoping to take advantage of people’s awareness of the outage, Briedis said. You should be extra careful of phishing attacks and emails that ask you to change your password to protect your account.