Kalcifer@sh.itjust.works

Kalcifer@sh.itjust.works

The Matrix.org homeserver ^[2] is currently experiencing a “major outage” ^[1.1] — the server is currently offline ^[1.3]. The timeline of events is currently (2025-09-03T00:14Z) as follows:

On 2025-09-02T17:39Z, an issue with the matrix.org database was reported by status.matrix.org ^[1.3].
At 2025-09-02T19:01:27Z, the Matrix.org Foundation later reported that the matrix.org secondary database lost its filesystem at 2025-09-02T11:17Z, and subsequently lost its primary database filesystem at 2025-09-02T17:26Z. ^[5]
At 2025-09-02T21:39:25Z, the Matrix.org Foundation reported that they were unable to restore the primary database filesystem, so they are restoring a 55TB database snapshot from the previous night. This restoration is expected to take more than 10 hours to recover the data, then greater than 4 hours to restore the data, then greater than 3 hours to catch up on missing traffic. ^[6]

References

Type: Website. Publisher: “Matrix.org”. Accessed: 2025-09-03T00:18Z URI: https://status.matrix.org/.
- This is the “system status and incidents” website for The Matrix.org Homeserver ^[4]
1. Type: Image.
2. Type: Image.
3. Type: Image.
Type: Meta.
- As seen on status.matrix.org, the site which tracks the system status and incidents of The Matrix.org Homeserver ^[3.1], the section “Matrix.org” contains the context: “The Matrix deployment on matrix.org” ^[1.2]. Within that section, Synapse, which is reported to have the outage ^[1.1] is a Matrix homeserver ^[5].
Type: Article. Title: “The Matrix.org Homeserver”. Publisher: “The Matrix.org Foundation”. Accessed: 2025-09-03T00:30. URI: https://matrix.org/homeserver/about/.
1. Type: Text. Location: ¶3.
  
  System status and incidents
Type: Website. Title: “Servers”. Publisher: “The Matrix.org Foundation”. Accessed: 2025-09-03T00:38Z. URI: https://matrix.org/ecosystem/servers/.
- Type: Image.
Type: Post>Text. Author: “The Matrix.org Foundation” (“@matrix@mastodon.matrix.org”). Publisher: “mastodon.social”. Published: 2025-09-02T19:01:27.000Z. Accessed: 2025-09-03T00:46Z. URI: https://mastodon.matrix.org/@matrix/115136245785561439.

So: the matrix.org database secondary lost its FS due to a RAID failure earlier today (11:17 UTC). Then, we lost the primary at 17:26. We’re trying to restore the primary DB FS (which could be fastish), while also doing a point-in-time backup restore from last night (which takes >10h). We believe the incremental DB traffic since last night is intact however. Apologies for the downtime; folks on their own homeserver are of course not impacted.
- “FS” is presumed to mean “filesystem” from
  
  […] the matrix.org database secondary lost its FS due to a RAID failure earlier today (11:17 UTC). Then, we lost the primary at 17:26. We’re trying to restore the primary DB FS […] ^[5]
  
  combined with
  
  […] we haven’t been able to restore the DB primary filesystem to a state we’re confident in running as a primary […] ^[6]
- “earlier today (11:17 UTC)” is assumed to mean 2025-09-02T19:01:27Z as this post was published on 2025-09-02T19:01:27.000Z.
- Since no other qualifying information is given, and given its successive nature to 11:17, “17:26” is presumed to mean 2025-09-02T17:26Z.
- […] Then, we lost the primary at 17:26 […]
  
  This is presumed to be referring to the primary database filesystem given the context:
  
  […] the matrix.org database secondary lost its FS due to a RAID failure earlier today (11:17 UTC). Then, we lost the primary at 17:26 […]
- “DB” is presumed to mean “database” given the following context
  
  […] the matrix.org database secondary lost its FS due to a RAID failure earlier today (11:17 UTC). Then, we lost the primary at 17:26. We’re trying to restore the primary DB FS […]
Type: Post>Text. Author “The Matrix.org Foundation” (“@matrix@mastodon.matrix.org”). Publisher: “mastodon.social”. Published: 2025-09-02T21:39:25.000Z. Accessed: 2025-09-03T00:50Z. URI: https://mastodon.matrix.org/@matrix/115136866878237078.

Sorry, but it’s bad news: we haven’t been able to restore the DB primary filesystem to a state we’re confident in running as a primary (especially given our experiences with slow-burning postgres db corruption). So we’re having to do a full 55TB DB snapshot restore from last night, which will take >10h to recover the data, and then >4h to actually restore, and then >3h to catch up on missing traffic. Huge apologies for the outage. Again, folks using their own homeservers are not impacted.
- “DB” is presumed to mean “database” ^[5].

Cross-posts:

https://sh.itjust.works/post/45298831

The Matrix.org Synapse homeserver is currently experiencing a "major outage"

The Matrix.org Synapse homeserver is currently experiencing a "major outage"