Operations grimoire/Incidents/2023-02-03-ESXi

From Nasqueron Agora

📕📁📜 Old technical information :: content warning

⌛ This Nasqueron Operations Grimoire page hasn't been updated for a long time.

☣ As our infrastructure evolves quickly, there is a good chance this information is outdated or now inaccurate. Be careful and consider update it.

➡️ To assert the information is still up-to-date or not, you can check the history of the relevant role in our Operations repository.

Not tracked on DevCentral as docker-001 is currently down.

Incident timeline

2023-02-03
  • 16:28:03 Dorian and Dereckson worked on pad.nasqueron.org, connection was stopped
  • 16:29:06 Hypervisor Dreadnought is confirmed working, attack against VMWare ESXi detected, server switched in rescue mode
  • 00:15:40 hyper-001 provisioned
2023-02-04
  • 12:58:36 migration from Dreadnought to hyper-001 done for router-001
  • 16:05:01 migration from Dreadnought to hyper-001 done for complector

Analysis

External sources:

First findings:

  • FreeBSD machines ZFS disks are intact (db-A-001, complector, router-001)
  • Linux machines LVM/xfs disks show I/O error (Equatower, docker-001)
  • https://enes.dev/ states the ...-flat.vmdk shouldn't be encrypted
  • Backup of the .vmx configuration file is available at .vmx~ when a machine was running during the attack

Actionables

  • Monitor VMWare ESXi last available patch vs our current version
  • Improve backup strategy
    • ZFS storage server
    • Glacier backup
    • Physical backup on site
  • [DONE] Install NetBox to document network as we provision the hypervisor it
  • Plan HA for critical services, at least for DevCentral