Cloud Infrastructures are Having a Bad Week

 


Today’s disruptions across Microsoft Azure and Amazon Web Services (AWS) were significant, but they’re not signs of cloud computing’s demise. Instead, they underscore the risks of centralization and the importance of designing systems that can withstand provider-level failures.

What happened today?

• Microsoft Azure outage: Azure’s Front Door service suffered a major disruption due to a misconfiguration, impacting services like Outlook, Xbox, Microsoft 365, and even third-party platforms like Starbucks and Alaska Airlines. The Azure website states a little more than disruption.

"Azure Front Door - Connectivity issues - Observing recovery

Starting at approximately 16:00 UTC on 29 October 2025, customers and Microsoft services leveraging Azure Front Door (AFD) may have experienced latencies, timeouts, and errors. We have confirmed that an inadvertent configuration change was the trigger event for this issue.

Affected Azure services may have included, but were not limited to:

App Service, Azure Active Directory B2C, Azure Communication Services, Azure Databricks, Azure Healthcare APIs, Azure Maps, Azure Portal, Azure SQL Database, Azure Virtual Desktop, Container Registry, Media Services, Microsoft Defender External Attack Surface Management, Microsoft Entra ID (Mobility Management Policy Service, Identity & Access Management, and User Management UX), Microsoft Purview, Microsoft Sentinel (Threat Intelligence), and Video Indexer."

• AWS confusion: While AWS appeared to be affected, Amazon clarified that its services were operating normally and that outage reports were likely inaccurate or unrelated to AWS itself, but according to AWS's on website

"[RESOLVED] Increased Error Rates and Latencies

Oct 28 10:57 PM PDT Between 9:00 AM and 10:43 PM PDT, we experienced increased latencies for EC2 instance launches within the use1-az2 Availability Zone (AZ) in the US-EAST-1 Region. In addition we experienced increased ECS task and pod launch failure rates. This impacted services that depend on ECS, including Fargate, EMR Serverless, MWAA, CodeBuild, EKS, Glue, AppRunner, DataSync, MWAA and AWS Batch. As part of the recovery effort, we temporarily throttled ECS operations in three impacted cells. At 10:43 PM, we had fully mitigated the issue. The issue has been resolved and the service is operating normally."

Why this matters

• Single points of failure: Many businesses rely heavily on one cloud provider. When Azure or AWS stumbles, ripple effects hit banking, retail, healthcare, and entertainment sectors.

• Growing complexity: As cloud services become more interdependent (e.g., CDNs, identity platforms, container orchestration), a fault in one layer can cascade across many others.

• Public trust and business continuity: Frequent outages erode confidence and can lead to regulatory scrutiny, especially in sectors like finance and healthcare.

What’s next for cloud computing?

Cloud isn’t “on the way out”—it’s evolving, they say:

• Multi-cloud and hybrid strategies: Organizations are increasingly adopting multi-cloud setups (e.g., Azure + AWS + GCP) and hybrid architectures to reduce dependency on any one provider.

•Edge computing: Processing data closer to users (at the edge) can reduce latency and mitigate cloud outages.

• Resilience engineering: Failover routing, traffic shaping, and chaos testing (like Netflix’s Chaos Monkey) are becoming standard practices.

• Regulatory pressure: Governments may push for cloud contingency plans, especially for critical infrastructure.

Time will tell. I am thinking a Hybrid is best for most environments. 

Popular posts from this blog

WSUS CVE-2025-59287 Mitigation

Best Alternatives to Windows 10