Operations grimoire/Incidents/2017-03-01-Eglide

From Nasqueron Agora
Revision as of 12:53, 6 March 2017 by Dereckson (talk | contribs) (Created page with "Tracked at https://devcentral.nasqueron.org/T1162. == Incident timeline == * 03:18:56 amj weechat client timeouts on Freenode * 03:19:05 Odderon timeouts too * 04:22:47 tomje...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

📕📁📜 Old technical information :: content warning

⌛ This Nasqueron Operations Grimoire page hasn't been updated for a long time.

☣ As our infrastructure evolves quickly, there is a good chance this information is outdated or now inaccurate. Be careful and consider update it.

➡️ To assert the information is still up-to-date or not, you can check the history of the relevant role in our Operations repository.

Tracked at https://devcentral.nasqueron.org/T1162.

Incident timeline

  • 03:18:56 amj weechat client timeouts on Freenode
  • 03:19:05 Odderon timeouts too
  • 04:22:47 tomjerr asks if it's down
  • 19:52:41 Sandlayth rebooted the server

After the incident, it was noticed Odderon didn't automatically connect:

  • 21:08:05 Odderon joins channel

Analysis

Outage root cause isn't known, logs doesn't contain any relevant information.

A simple reboot was enough to resume service, but 16 hours was needed to reach Sandlayth, alone with credentials to do it.

Actionables

  • Get Sandlayth contact information on file
  • Ensure Scaleway account is accessible
  • [DONE] Enable Odderon service (D934)