Operations grimoire/Restart a Docker engine

📕📁📜 Old technical information :: content warning

⌛ This Nasqueron Operations Grimoire page hasn't been updated for a long time.

☣ As our infrastructure evolves quickly, there is a good chance this information is outdated or now inaccurate. Be careful and consider update it.

➡️ To assert the information is still up-to-date or not, you can check the history of the relevant role in our Operations repository.

Restart

A lot of vital components are managed by Docker.

Ideally, we should have redundancy to avoid container lost.

Restart Docker engine
Restart containers, either manually, either with the docker-containers systemd unit
Run production tests

Services needing manual tweaking after container restart

Phabricator instances

DevCentral needs:

bin/phd status
bin/phd stop <the PID of PhabricatorBot>
sv restart phd
chpst -u app bin/phd launch PhabricatorBot /opt/phabricator/conf/xessife.json

Other instances needs a sv restart phd.

Broker

Wearg needs to be manually reconnected to the broker:

.tcl mq disconnect
.tcl utimers and kill remaining timers if needed
.tcl mq broker::connect
.tcl broker::on_tick

That requires a owner access to Wearg (ping Dereckson).

Troubleshooting

When MySQL isn't reachable

If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <correct IP> mysql line.

What containers need MySQL and symptoms when not reachable?

Container	MySQL priority	Symptom when not running
silly_bardeen	Needed for some CI tasks	Jenkins jobs test-auth-grove-* will fail: PHPUnit test issue: \Tests\Models\UsersTest::testTryGetFromExternalSource PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
cachet	High	App doesn't work
devcentral	High	App doesn't work
etherpad	High	Container doesn't start
wolfphab	High	App doesn't work

When a container doesn't want to restart

Try first to see what happens with docker logs <container name>.

If that doesn't work, go to the services section of the grimoire and reprovision it.

Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation.

Anonymous

Search

Operations grimoire/Restart a Docker engine

Namespaces

More

Page actions

Contents

Restart

Services needing manual tweaking after container restart

Phabricator instances

Broker

Troubleshooting

When MySQL isn't reachable

What containers need MySQL and symptoms when not reachable?

When a container doesn't want to restart

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Operations grimoire/Restart a Docker engine

Restart

Services needing manual tweaking after container restart

Phabricator instances

Broker

Troubleshooting

When MySQL isn't reachable

What containers need MySQL and symptoms when not reachable?

When a container doesn't want to restart

Navigation

Wiki tools

Page tools