OpenAI has released an incident report detailing a significant outage that affected multiple services, including ChatGPT, on December 26, 2024. The disruption, which began at 10:40 AM and lasted until early evening, was attributed to a cloud provider data center failure that impacted the company’s databases.
According to the report, while most services were restored by 3:11 PM, ChatGPT required additional time for full recovery, achieving 100% functionality by 6:20 PM. The outage affected several key OpenAI services, including Sora video creation and various APIs for agents, realtime speech, batch processing, and DALL-E.
The company explained that despite having databases mirrored across regions, the recovery process was complicated by the need for manual intervention from the cloud provider to redirect operations to a backup datacenter. The scale of the project was cited as the primary reason for the extended downtime.
“In the coming weeks, we will embark on a major infrastructure initiative to ensure our systems are resilient to an extended outage in any region of any of our cloud providers by adding a layer of indirection under our control in between our applications and our cloud databases,” OpenAI stated in the report. “This will allow significantly faster failover.”
The impact of the outage was widespread, with user reports emerging from both Europe and North America. According to Google Trends data, this incident may have been the largest of its kind, generating more search queries than any previous OpenAI service disruption.
A failover system, which automatically switches to backup systems during failures, is being developed by OpenAI to prevent similar incidents in the future. The company has announced plans to implement infrastructure changes to improve its response to future cloud database failures.
The incident demonstrates the growing reliance on AI services and the potential impact of technical infrastructure failures on global users. OpenAI’s response includes both immediate solutions and long-term infrastructure improvements to enhance service reliability.