Incidents | Cerebrium

Incidents | Cerebrium Incidents reported on status page for Cerebrium https://status.cerebrium.ai/ en Scheduled Maintenance: Crusoe us-east-1 region unavailable https://status.cerebrium.ai/incident/924585 Tue, 16 Jun 2026 10:00:00 -0000 https://status.cerebrium.ai/incident/924585#8446f1858f340618ffa9470163c92b674d7f208934e3b574354799ca54c66165 We are performing urgent maintenance in our Crusoe us-east-1 region tomorrow, from 10:00 - 16:00 UTC. Apps pinned to this region and provider will be affected for the duration of the window. To avoid interruption, please deploy to another Cerebrium region or provider before 10:00 UTC tomorrow. Apps running on our new global infrastructure are unaffected. They will fail over automatically to other regions/providers - no action is required. Thank you for your continued support and patience. Builds are broken https://status.cerebrium.ai/incident/917975 Mon, 08 Jun 2026 13:49:00 -0000 https://status.cerebrium.ai/incident/917975#0d37513e15e40b0cecbb0c3a7d7c562a980c49fbbd50b35ae6db78639df3360e Build service is fully restored Builds are broken https://status.cerebrium.ai/incident/917975 Mon, 08 Jun 2026 13:43:00 -0000 https://status.cerebrium.ai/incident/917975#c3e032c254535c77f650c49aa5d299f35f671109a445fc87c80dcbcc037d3b7e Most builds are working Builds are broken https://status.cerebrium.ai/incident/917975 Mon, 08 Jun 2026 13:15:00 -0000 https://status.cerebrium.ai/incident/917975#96832d538ae5c1444e56b3779beb381e68f4509ea8601cfde3f13bc309b84656 US and EU builds are broken Build Service Maintenance recovered https://status.cerebrium.ai/ Sat, 30 May 2026 09:51:30 +0000 https://status.cerebrium.ai/#4e17f928a29178bc332ad0ee1e4c0e37ac50c3662548c4ebf926925691f68fbd Build Service Maintenance recovered Build Service recovered https://status.cerebrium.ai/ Sat, 30 May 2026 09:51:20 +0000 https://status.cerebrium.ai/#8bcfd0fdffbd9bf12a3864596838cbe9b08500300fa459b65f640ca87ae659ca Build Service recovered Build Service Maintenance went down https://status.cerebrium.ai/ Sat, 30 May 2026 09:31:29 +0000 https://status.cerebrium.ai/#4e17f928a29178bc332ad0ee1e4c0e37ac50c3662548c4ebf926925691f68fbd Build Service Maintenance went down Build Service went down https://status.cerebrium.ai/ Sat, 30 May 2026 09:31:25 +0000 https://status.cerebrium.ai/#8bcfd0fdffbd9bf12a3864596838cbe9b08500300fa459b65f640ca87ae659ca Build Service went down Build Service Maintenance recovered https://status.cerebrium.ai/ Fri, 29 May 2026 20:06:27 +0000 https://status.cerebrium.ai/#20c87ab55b1e168a34c166ec062b64eeec1f40a26101158211938bc1090edcd3 Build Service Maintenance recovered Build Service recovered https://status.cerebrium.ai/ Fri, 29 May 2026 20:06:23 +0000 https://status.cerebrium.ai/#15ab964692277a47a91908d39aaa1f20134c7c146bba0928053326f98988fb2e Build Service recovered Build Service went down https://status.cerebrium.ai/ Fri, 29 May 2026 19:56:35 +0000 https://status.cerebrium.ai/#15ab964692277a47a91908d39aaa1f20134c7c146bba0928053326f98988fb2e Build Service went down Build Service Maintenance went down https://status.cerebrium.ai/ Fri, 29 May 2026 19:56:27 +0000 https://status.cerebrium.ai/#20c87ab55b1e168a34c166ec062b64eeec1f40a26101158211938bc1090edcd3 Build Service Maintenance went down Build Service Maintenance recovered https://status.cerebrium.ai/ Fri, 29 May 2026 19:51:28 +0000 https://status.cerebrium.ai/#0a30b7a63699fb8bfb30c52e6f0ca3a1a10a124f7c40ab184d35040413648e3e Build Service Maintenance recovered Build Service recovered https://status.cerebrium.ai/ Fri, 29 May 2026 19:51:22 +0000 https://status.cerebrium.ai/#19232c44fea491681ea623d17961f816edf6e38dda9fe26bc7db194475d76b32 Build Service recovered Build Service Maintenance went down https://status.cerebrium.ai/ Fri, 29 May 2026 19:41:30 +0000 https://status.cerebrium.ai/#0a30b7a63699fb8bfb30c52e6f0ca3a1a10a124f7c40ab184d35040413648e3e Build Service Maintenance went down Build Service went down https://status.cerebrium.ai/ Fri, 29 May 2026 19:41:23 +0000 https://status.cerebrium.ai/#19232c44fea491681ea623d17961f816edf6e38dda9fe26bc7db194475d76b32 Build Service went down Inference Degraded in US East 1 https://status.cerebrium.ai/incident/907571 Thu, 28 May 2026 22:39:00 -0000 https://status.cerebrium.ai/incident/907571#5c020fc321b11bced34ab2fb1eee4e604f6947c610fa0abeafb3a0015dd7f394 all services are restored. we are continuing to monitor Inference Degraded in US East 1 https://status.cerebrium.ai/incident/907571 Thu, 28 May 2026 22:35:00 -0000 https://status.cerebrium.ai/incident/907571#12d08d3c448d8bc05e820bf93644358e7c7cf6fa815dc5bbd3937a1f23f038a0 The issue has been mitigated, and inference should be working as normal. Inference Degraded in US East 1 https://status.cerebrium.ai/incident/907571 Thu, 28 May 2026 22:10:00 -0000 https://status.cerebrium.ai/incident/907571#67b2fe6066908bfe0e3c52eecfff95e1130436426664798a62dbbfc3b395b82f Inference Degraded in US East 1 Inference Degraded in US East 1 https://status.cerebrium.ai/incident/907571 Thu, 28 May 2026 21:35:00 -0000 https://status.cerebrium.ai/incident/907571#91211c5dee1b788bbd39b5b80473f21ce422ce3c6daa4ef990c340dbe4468ea1 Requests are slow and some are failing US EAST 1 recovered https://status.cerebrium.ai/ Thu, 21 May 2026 17:22:46 +0000 https://status.cerebrium.ai/#b62d809d4db8217b4dc1af7d5d07e482bcf6346ec46ba6d22124d103647c22d5 US EAST 1 recovered US EAST 1 went down https://status.cerebrium.ai/ Thu, 21 May 2026 17:02:44 +0000 https://status.cerebrium.ai/#b62d809d4db8217b4dc1af7d5d07e482bcf6346ec46ba6d22124d103647c22d5 US EAST 1 went down Unable to schedule workloads in crusoe us-east-1a https://status.cerebrium.ai/incident/900559 Wed, 20 May 2026 19:02:00 -0000 https://status.cerebrium.ai/incident/900559#3dff7911d99d64b4b78b2f5dc0f20774ad55950861341926425d7fe713843936 Upstream provider is back online and all services have been restored. Unable to schedule workloads in crusoe us-east-1a https://status.cerebrium.ai/incident/900559 Wed, 20 May 2026 09:42:00 -0000 https://status.cerebrium.ai/incident/900559#4c1190bc6c91e966bc6ec2789e80910d81dbb0ca696e42827cec5b953ab69ad9 Crusoe is making progress on mitigation in us-east-1. A number of VMs have returned to service, though a subset of compute hosts is still impacted and the incident isn't fully resolved yet. Engineering on the upstream side remains actively engaged. We'll continue posting updates as recovery progresses. Unable to schedule workloads in crusoe us-east-1a https://status.cerebrium.ai/incident/900559 Wed, 20 May 2026 08:27:00 -0000 https://status.cerebrium.ai/incident/900559#ac812de5c081c2a0e0129132ce60f3a912f45791096cf9228c9c7241ae1d7293 Crusoe has a mitigation plan in place and is currently testing it. Early results are positive, with a few more tests to run before applying it across all impacted hosts. We'll post another update once the mitigation is rolled out or if anything changes. Workloads in Crusoe us-east-1 remain affected in the meantime. Unable to schedule workloads in crusoe us-east-1a https://status.cerebrium.ai/incident/900559 Wed, 20 May 2026 03:30:00 -0000 https://status.cerebrium.ai/incident/900559#9bfccd6a7f139d01a8ce3ca57b590ccf039c7f3dc561b203e883562fd4c5ff1a We're seeing a full outage of our Crusoe us-east-1 region affecting all workloads deployed there. The upstream provider is experiencing an infrastructure failure and we're working with them on resolution. Impact: All apps deployed to Crusoe us-east-1 are unavailable. Requests to affected deployments will fail. Workaround: If your app is configured for multi-region or has a fallback region, traffic should route automatically. Customers running solely in Crusoe us-east-1 can redeploy to another region (e.g. AWS us-east-1, AWS us-east-2) in the meantime. We'll post updates here as we have them. Apologies for the disruption. Build Service recovered https://status.cerebrium.ai/ Mon, 18 May 2026 21:25:31 +0000 https://status.cerebrium.ai/#c3c0e28804bfe2f85d1b6ffdc5fa8145d44c5f82394cbaa2e3b78682674f32e8 Build Service recovered Build Service went down https://status.cerebrium.ai/ Mon, 18 May 2026 21:04:38 +0000 https://status.cerebrium.ai/#c3c0e28804bfe2f85d1b6ffdc5fa8145d44c5f82394cbaa2e3b78682674f32e8 Build Service went down Unable to schedule workloads on Crusoe (both regions) https://status.cerebrium.ai/incident/895827 Thu, 14 May 2026 13:01:00 -0000 https://status.cerebrium.ai/incident/895827#8c0b7508f727bc7304a218e924de8becfee7c1a3e0c765db9526de7c917fa699 Workloads in both Crusoe regions are running normally again. After rolling back the change, we manually cleared the broken state in our infrastructure and brought capacity back online. A post-mortem will follow. Thanks for your patience. Unable to schedule workloads on Crusoe (both regions) https://status.cerebrium.ai/incident/895827 Thu, 14 May 2026 11:49:00 -0000 https://status.cerebrium.ai/incident/895827#171167a905febbe4b8b353b846a7c72b605c33bf81a8dc5e8584576e02c8d3f9 We've identified and mitigated the underlying issue and are now working to bring services back online in both Crusoe regions. Workloads are not yet schedulable while recovery is underway. Next update in 30 minutes or when resolved. Unable to schedule workloads on Crusoe (both regions) https://status.cerebrium.ai/incident/895827 Thu, 14 May 2026 11:07:00 -0000 https://status.cerebrium.ai/incident/895827#6138c3f1bedb7fbbaf6f1409304a8682c9fa60ed882333aa5a727be7ff3c4ef9 We've identified an issue affecting both new deployments and running workloads in our Crusoe regions. A recent internal release is preventing workloads from running correctly. We're rolling back the change now and expect things to recover shortly. Next update in 30 minutes or when resolved. New builds broken in US East https://status.cerebrium.ai/incident/895373 Wed, 13 May 2026 20:45:00 -0000 https://status.cerebrium.ai/incident/895373#9fef88b618a0d2045940e8bde6729c140b953d0e5cbd5cbe65ecd4df0f9afde6 Builds are currently not working in US East Build Service recovered https://status.cerebrium.ai/ Wed, 13 May 2026 19:04:38 +0000 https://status.cerebrium.ai/#1aeb56598bd327e6511f641a1d01db91dec48f6ee64eb680456a8f22cedb4917 Build Service recovered Build Service went down https://status.cerebrium.ai/ Wed, 13 May 2026 18:37:30 +0000 https://status.cerebrium.ai/#1aeb56598bd327e6511f641a1d01db91dec48f6ee64eb680456a8f22cedb4917 Build Service went down Issue routing inference calls to newly deployed applications https://status.cerebrium.ai/incident/884315 Thu, 30 Apr 2026 01:30:00 -0000 https://status.cerebrium.ai/incident/884315#a2ed6f442611bea58d4a5819e72b08f82553fd568b12ac9cf1aee0f6fefad645 Newly deployed apps are responding 404 despite being deployed. Build Service recovered https://status.cerebrium.ai/ Thu, 23 Apr 2026 21:04:36 +0000 https://status.cerebrium.ai/#c46b2fe66e15b89d4c0b45ab2eab8e23ac49fd6a0c43490fbcb43f35b5a8e96b Build Service recovered Build Service went down https://status.cerebrium.ai/ Thu, 23 Apr 2026 19:58:33 +0000 https://status.cerebrium.ai/#c46b2fe66e15b89d4c0b45ab2eab8e23ac49fd6a0c43490fbcb43f35b5a8e96b Build Service went down Build Service recovered https://status.cerebrium.ai/ Thu, 16 Apr 2026 01:50:32 +0000 https://status.cerebrium.ai/#9387fbf6e6eb3889d4f2d84e81a5405d6422ebf2dba08f04858f6c6784520760 Build Service recovered Build Service went down https://status.cerebrium.ai/ Thu, 16 Apr 2026 01:07:23 +0000 https://status.cerebrium.ai/#9387fbf6e6eb3889d4f2d84e81a5405d6422ebf2dba08f04858f6c6784520760 Build Service went down Registry US EAST 1 recovered https://status.cerebrium.ai/ Wed, 15 Apr 2026 22:22:51 +0000 https://status.cerebrium.ai/#64e0c33e8256b7fe3eaaf18a7531583709290cbecd56ee6762bdbfc6ce776e93 Registry US EAST 1 recovered US EAST 1 recovered https://status.cerebrium.ai/ Wed, 15 Apr 2026 22:22:45 +0000 https://status.cerebrium.ai/#2af11cc47285d06d408d77168d849fcd890cf63b8a4c2a88588ef078561f0937 US EAST 1 recovered Registry US EAST 1 went down https://status.cerebrium.ai/ Wed, 15 Apr 2026 22:06:52 +0000 https://status.cerebrium.ai/#64e0c33e8256b7fe3eaaf18a7531583709290cbecd56ee6762bdbfc6ce776e93 Registry US EAST 1 went down US EAST 1 went down https://status.cerebrium.ai/ Wed, 15 Apr 2026 22:06:41 +0000 https://status.cerebrium.ai/#2af11cc47285d06d408d77168d849fcd890cf63b8a4c2a88588ef078561f0937 US EAST 1 went down Build Service recovered https://status.cerebrium.ai/ Tue, 14 Apr 2026 23:13:31 +0000 https://status.cerebrium.ai/#8be01f67d9c9eea3018ddf299e65174283c5fb5c988401e1993ceba583f7cfc6 Build Service recovered Build Service went down https://status.cerebrium.ai/ Tue, 14 Apr 2026 22:42:26 +0000 https://status.cerebrium.ai/#8be01f67d9c9eea3018ddf299e65174283c5fb5c988401e1993ceba583f7cfc6 Build Service went down Registry US EAST 1 recovered https://status.cerebrium.ai/ Sun, 05 Apr 2026 22:28:58 +0000 https://status.cerebrium.ai/#0e942516d029940b42333ad4c7e3a8fe840efe9f262515762d78efadaa592408 Registry US EAST 1 recovered US EAST 1 recovered https://status.cerebrium.ai/ Sun, 05 Apr 2026 22:28:44 +0000 https://status.cerebrium.ai/#a554f512d9038c3d23c6711d6a8040522aaaabff8ac3c46e6bc48ec9bbbe49c4 US EAST 1 recovered Registry US EAST 1 went down https://status.cerebrium.ai/ Sun, 05 Apr 2026 22:08:45 +0000 https://status.cerebrium.ai/#0e942516d029940b42333ad4c7e3a8fe840efe9f262515762d78efadaa592408 Registry US EAST 1 went down US EAST 1 went down https://status.cerebrium.ai/ Sun, 05 Apr 2026 22:08:35 +0000 https://status.cerebrium.ai/#a554f512d9038c3d23c6711d6a8040522aaaabff8ac3c46e6bc48ec9bbbe49c4 US EAST 1 went down EU WEST 2 recovered https://status.cerebrium.ai/ Sun, 05 Apr 2026 19:51:20 +0000 https://status.cerebrium.ai/#f7d320dbc4432f9b5993a12d76f02f9d86cf9802ac1e5c95a5e49a4353be94a9 EU WEST 2 recovered Registry EU WEST 2 recovered https://status.cerebrium.ai/ Sun, 05 Apr 2026 19:50:24 +0000 https://status.cerebrium.ai/#14832e3915d9f705d45375f2e4f7215b9bdcda6e4c4ec22b8e178ad54da006ee Registry EU WEST 2 recovered EU WEST 2 went down https://status.cerebrium.ai/ Sun, 05 Apr 2026 19:47:24 +0000 https://status.cerebrium.ai/#f7d320dbc4432f9b5993a12d76f02f9d86cf9802ac1e5c95a5e49a4353be94a9 EU WEST 2 went down Registry EU WEST 2 went down https://status.cerebrium.ai/ Sun, 05 Apr 2026 19:47:24 +0000 https://status.cerebrium.ai/#14832e3915d9f705d45375f2e4f7215b9bdcda6e4c4ec22b8e178ad54da006ee Registry EU WEST 2 went down Cerebrium Landing Page recovered https://status.cerebrium.ai/ Fri, 03 Apr 2026 10:55:17 +0000 https://status.cerebrium.ai/#f1d4200dbc2e3472adb7297163c5f93e40a6ebf4fa71242bf491e937101204d5 Cerebrium Landing Page recovered Cerebrium Landing Page went down https://status.cerebrium.ai/ Fri, 03 Apr 2026 10:53:15 +0000 https://status.cerebrium.ai/#f1d4200dbc2e3472adb7297163c5f93e40a6ebf4fa71242bf491e937101204d5 Cerebrium Landing Page went down Filesystem & Infrastructure Scaling Improvements https://status.cerebrium.ai/incident/851226 Sun, 22 Mar 2026 15:00:37 -0000 https://status.cerebrium.ai/incident/851226#aab4b4cb4e0860bd64fb1e2fb28d20af46dc1a28ebeb3ea99db95721a251d763 Maintenance completed Filesystem & Infrastructure Scaling Improvements https://status.cerebrium.ai/incident/851226 Sun, 22 Mar 2026 14:00:37 -0000 https://status.cerebrium.ai/incident/851226#6f28296c463fa72e42d355ff8c2e520f3c380aa4c8e13684c2766a1954a50a2d We are carrying out planned infrastructure work on 22 March (09:00 EST, for 1 hour) to improve workload scaling and filesystem performance across our clusters. During this window you should expect intermittent downtime that may affect active runs, along with increased latency on deployments and inference requests. Not all regions will be affected simultaneously. US-east-1 is down https://status.cerebrium.ai/incident/843719 Sat, 07 Mar 2026 19:20:00 -0000 https://status.cerebrium.ai/incident/843719#2e1739e0fd9e25dc8dba63407c6472b0c8ffe9acabd0df76baf3ff04b3fa9a1d Our AWS us-east-1 region is down - we have identified the issue and the team is working on resolving it. It has been resolved Increase in 502 errors https://status.cerebrium.ai/incident/819123 Thu, 05 Feb 2026 08:48:00 -0000 https://status.cerebrium.ai/incident/819123#1a6978205cac36d53e7bc13591a56c5a4dc24233b09559ec76f84b222312d822 Some customers are experiencing an increase in 502 errors in US-EAST-1 due to a contention issue on the platform. The team is currently investigating and will revert back as soon as there is more information. We sincerely apologise for this issue and are working to get it resolved as quickly as possible CLI authentication failing https://status.cerebrium.ai/incident/818383 Wed, 04 Feb 2026 10:38:00 -0000 https://status.cerebrium.ai/incident/818383#e00a3f355dc663548b7d3e02bf6b1f26062fb86953948827592deadcb0b42a3f The issue has been resolved. An incorrectly configured DNS record caused users to be unable to sign in using the CLI Increase in request queuing on AWS workloads https://status.cerebrium.ai/incident/802912 Mon, 12 Jan 2026 09:00:00 -0000 https://status.cerebrium.ai/incident/802912#7ab384d4aba5d5dcf8222a210d787a36433e8c2ff3171bf5f85060e21b8cd863 We're currently experiencing degraded performance on workloads being scheduled to the AWS provider. This issue currently only affects GPU-based workloads. This issue is intermittent and may not be affecting all apps. The team is currently investigating the issue and we will provide an update as we uncover any new information. Problem starting new workloads. Existing apps are unaffected. https://status.cerebrium.ai/incident/783164 Tue, 09 Dec 2025 19:08:00 -0000 https://status.cerebrium.ai/incident/783164#3a148ec3d4aca0a7568662b6de72a63d3307b7004630d8db4f39a2d78be6ec4c The issue has been resolved Problem starting new workloads. Existing apps are unaffected. https://status.cerebrium.ai/incident/783164 Tue, 09 Dec 2025 18:44:00 -0000 https://status.cerebrium.ai/incident/783164#830358f004199aa5af28e313f89f76798f7c9008f45ffd0d748217510683a6ce New apps are unable to start at present. Elevated Errors in US-East-1 https://status.cerebrium.ai/incident/778505 Tue, 02 Dec 2025 23:54:00 -0000 https://status.cerebrium.ai/incident/778505#9a6a4a594b4a98c27a6518f481d7a24a1c5d001b1b7369a32cd3ff823a3829aa Our platform is current struggling to schedule new containers on incoming requests. Our team is working on identifying the error and resolving ASAP Resolved: The issue was caused by a failure in a managed component from one of our infrastructure providers, which temporarily prevented us from scheduling new capacity. We’ve worked with the provider to restore functionality and are now implementing additional safeguards to ensure this does not recur. Updating various cluster components https://status.cerebrium.ai/incident/765784 Sun, 16 Nov 2025 15:35:00 -0000 https://status.cerebrium.ai/incident/765784#44c162f4b64153670bac6f17c25bfa4e676dc9f436b6a01c2f3a84cc52e0defd We are performing a series of infrastructure optimizations to improve performance and reliability. While we don’t expect customer traffic to be impacted, there may be brief periods of elevated latency or volatility during the upgrade window. Our team is closely monitoring the rollout and will update this page with any relevant changes. Emergency node maintenance in US-East-1 https://status.cerebrium.ai/incident/757186 Tue, 04 Nov 2025 04:00:34 -0000 https://status.cerebrium.ai/incident/757186#d47ae91f32582e55a5a2dcc9e6bc40e24a2191052cb85532b3e4de37ecdcefe7 A critical error in the mechanism GPU devices use to attach to containers is affecting several workloads on the platform, causing NVML to show "Device not found" when calling nvidia-smi or attempting to use the GPU (Mentioned in https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html#containers-losing-access-to-gpus-with-error-failed-to-initialize-nvml-unknown-error). This maintenance will update all GPU nodes to use the CDI, as well as a few container runtime upgrades. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 23:36:00 -0000 https://status.cerebrium.ai/incident/746816#def6d05d3ec66619875cc72f480e59b5e4fc16b651f34fefe86783f281a574ad Resolved Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 19:17:00 -0000 https://status.cerebrium.ai/incident/746816#c2796bb6bcf9aeb2ead994d0816196a44f82c71df6e0c874a4db8826faec0b59 We continue to observe recovery across all AWS services, and instance launches are succeeding across multiple Availability Zones in the US-EAST-1 Regions Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 18:24:00 -0000 https://status.cerebrium.ai/incident/746816#08caa7295d5150a3985a744fff62124415b3fe892f24c413ded26bc73486cbcb AWS's mitigations to resolve launch failures for new EC2 instances continue to progress and we are seeing increased launches of new EC2 instances. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 17:48:00 -0000 https://status.cerebrium.ai/incident/746816#63d5cc546455aa540f4c11553e1ee571569501a82cc40bf8190db9cf776ad430 AWS have resolved launch failures and are rolling out the changes to all AZ's at which point we expect launch errors and network connectivity issues to subside. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 17:04:00 -0000 https://status.cerebrium.ai/incident/746816#0344617f774b7af62a0b35ad079fe58cd65549059c279b494e519764a530a924 AWS is in the process of validating a fix for EC2 launches and will deploy to the first AZ as soon as they have confidence we can do so safely. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 15:47:00 -0000 https://status.cerebrium.ai/incident/746816#869f54ebbeff72d23a7f83e3ee9b40b543149323c691df5acef5f39efa5e3be7 AWS have narrowed down the source of the network connectivity issues that have impacted their services. They are throttling requests for new EC2 instance launches to aid recovery and actively working on mitigations. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 14:01:00 -0000 https://status.cerebrium.ai/incident/746816#55398d34ca052b66a663b8a6fafb6229c9d65baf791efe2ae6318d5cc992ecff AWS has applied fixes but is still experiencing problems launching instances in us-east-1. Builds and endpoint calls remain broken. We'll keep you posted. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 13:28:00 -0000 https://status.cerebrium.ai/incident/746816#67599f05df32c991402005470e8eaf57294cf54e0e8c0e1a09a50c5bef88da37 The AWS outage is ongoing. Builds are currently broken due to an outage with EC2. We're waiting on AWS to resolve the issue and will keep you updated. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 11:10:00 -0000 https://status.cerebrium.ai/incident/746816#a49214b9a48be3601aad264a1fdf6dc91ff8867170cd7b4c97618fc61a65bc16 All services have now been restored fully. We will continue to monitor for any anomalies. Thank you for your patience and we apologise for the inconvenience. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 09:43:00 -0000 https://status.cerebrium.ai/incident/746816#15bc346de987b0c270ff70ae21f1a5339045ee3e609949ccc47943dbc02a18d0 Most services have now recovered. You may still experience issues building apps on Cerebrium while AWS continues to resolve the remaining problems. We'll update you once everything is back to normal. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 09:31:00 -0000 https://status.cerebrium.ai/incident/746816#f2755af8f9d9beb9133349e36c2cb6dd9b14b1d56cc67d2fc1b92ca5cee1077f AWS has applied a fix and some services are starting to recover. You may still see some errors or slower response times as things fully stabilize. If something fails, please try again. We'll keep you posted as more services are restored. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 09:01:00 -0000 https://status.cerebrium.ai/incident/746816#7f5682cdb78d1f389b7f350a2e1e75fd236a69267522cbc2fbf643b51989e0ad AWS has identified the root cause as a DNS resolution issue affecting DynamoDB and other services in US-EAST-1. They're working on multiple recovery paths to accelerate the fix. Cerebrium services remain impacted during this time. If you encounter errors, please continue to retry your requests. AWS will provide their next update by 2:45 AM. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 08:29:00 -0000 https://status.cerebrium.ai/incident/746816#871cd301c7bebebef8f179e43876babbef14a6d7fbd37f53a467067d6240c74e The AWS team have narrowed critically affected services down, however, these services are core to the Cerebrium platform and your dashboards, builds, and endpoint calls are still affected. We are continuing to investigate and will provide more updates within the next 45 minutes. Elevated upstream errors (us-east-1) https://status.cerebrium.ai/incident/746816 Mon, 20 Oct 2025 07:38:00 -0000 https://status.cerebrium.ai/incident/746816#5a509bc68dcfde22169faca0750514fa7e5c34b578ec1b50a44df545757ed329 We are seeing elevated error rates from upstream AWS errors across the majority of our services in the us-east-1 region. We will share an update as soon as possible. Degraded Inference API in US-EAST-1 https://status.cerebrium.ai/incident/740083 Wed, 08 Oct 2025 18:28:00 -0000 https://status.cerebrium.ai/incident/740083#89d8d4e8dd689746d3c782842aa817ffbafe52467e5adfe5607a9365aceac920 The Inference API is currently experiencing degraded performance in US-EAST-1. Our team is working on a fix ASAP Inference API https://status.cerebrium.ai/incident/737024 Fri, 03 Oct 2025 13:13:00 -0000 https://status.cerebrium.ai/incident/737024#e1c9c6e4c4fdf5e170832cdabbc8311af2f5a5ebda3688b6218f48be2e12c17e Inference API is currently experiencing a High 502 failure rate. Roughly 45% of all requests are affected. Our team is currently investigating the cause of the issue as a matter of high urgency. Container Count is down https://status.cerebrium.ai/incident/726877 Thu, 18 Sep 2025 23:05:00 -0000 https://status.cerebrium.ai/incident/726877#02b232bc3e6fa758f3a2ce6d5b1043c6c6f3e30c2573ee0fa7cd895a0e38bb5f A 3rd party provider is down affecting the container count on the dashboard.