Operations grimoire/Incidents/2025-11-25-MariaDB

From Nasqueron Agora
Revision as of 11:19, 25 November 2025 by Dereckson (talk | contribs) (Created page with "MariaDB extended downtime for InnoDB tables. == Incident timeline == ; 2025-11-25 * 10:22 - Deployment of {{D|3890}} - ZFS volumes change for MariaDB to fix {{T|2074}} * 10:22 - MariaDB is restarted * 10:22 - Services using MyISAM tables are OK, but services using InnoDB like wikis are down * 10:24 - Quick investigation shows engine error for InnoDB tables * 10:25 - Unmount and remount ZFS volumes for InnoDB data and logs * 10:25 - Restart MariaDB server, databases are...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

MariaDB extended downtime for InnoDB tables.

Incident timeline

2025-11-25
  • 10:22 - Deployment of D3890 - ZFS volumes change for MariaDB to fix T2074
  • 10:22 - MariaDB is restarted
  • 10:22 - Services using MyISAM tables are OK, but services using InnoDB like wikis are down
  • 10:24 - Quick investigation shows engine error for InnoDB tables
  • 10:25 - Unmount and remount ZFS volumes for InnoDB data and logs
  • 10:25 - Restart MariaDB server, databases are served correctly again

Timestamps are UTC. Timestamps are an estimation, but 10:22 for MariaDB restart is accurate (from Salt).

Analysis

When we noticed wikis were down, we browsed the following logs:

  • /var/log/www/nasqueron.org/wikis-error.log (not relevant, only confirms PHP error 500)
  • /var/log/mediawiki/error.log (relevant)

We also previously directly confirmed MediaWiki can connect to database with sudo -u mw agora shell

MediaWiki was showing objectcache table doesn't exist in current engine.

Done with MySQL client to check databases and tables: SHOW DATABASES, \u nasqueron_wiki, SHOW CREATE TABLE objectcache.

Fix

$ /usr/local/etc/rc.d/mysql-server stop
$ umount -f arcology/mysql-innodb-data
$ umount -f arcology/mysql-innodb-logs
$ zfs mount arcology/mysql-innodb-data
$ zfs mount arcology/mysql-innodb-logs
$ /usr/local/etc/rc.d/mysql-server start

Actionables

  • Always unmount and remount ZFS volumes if touched, even if only the parent dataset has been updated, don't trust ZFS automount blindly