Operations grimoire/Checklist router post-restart: Difference between revisions
From Nasqueron Agora
| No edit summary |  (+cats) | ||
| Line 19: | Line 19: | ||
| [https://devcentral.nasqueron.org/T1862 T1862] offers to detect MySQL issues and reconnect. | [https://devcentral.nasqueron.org/T1862 T1862] offers to detect MySQL issues and reconnect. | ||
| [[Category:Operations grimoire]] | |||
| [[Category:IRC bots]] | |||
Latest revision as of 20:36, 26 October 2024
A router-001 restart severes GRE tunnels and their existing TCP connections.
As such, this checklist notes the known issues when private network is disrupted.
To partially prevent this, a blue/green deployment or immutable artifacts should be encouraged. It's recommended for the next non urgent upgrade to create router-002 and move GRE connections to it. The checklist should still be checked.
As most services are currently on hyper-001, they don't need the router to communicate, so lot of things will still be working.
IRC eggdrops
They require a GRE tunnel between server with viperserv role and the router to connect to both complector (Vault) and a db-B server (MariaDB).
On Libera:
- Dæghrefn: MySQL is normally at periodic interval reconnected, but you can trigger this with .tcl sqlrehash
- Wearg: MySQL does NOT reconnect, and that will probably exit from the timer
- .tcl sqlrehashto reconnect to MySQL
- .tcl broker::on_tickto reconnect to broker and start timer to get message
- .tcl utimersto ensure timer is good, you should see a- :broker::on_ticktimer
 
T1862 offers to detect MySQL issues and reconnect.