Operations grimoire/Restart a Docker engine: Difference between revisions
Line 30: | Line 30: | ||
== Troubleshooting == | == Troubleshooting == | ||
=== When MySQL isn't | === When MySQL isn't reachable === | ||
If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <code><correct IP> mysql</code> line. | If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <code><correct IP> mysql</code> line. | ||
=== What containers need MySQL and symptoms when not reachable? === | |||
{| class="wikitable sortable" | |||
|- | |||
! Container !! MySQL priority !! Symptom when not running | |||
|- | |||
| silly_bardeen || Needed for some CI tasks || Jenkins jobs '''test-auth-grove-*''' will fail:<br />PHPUnit test issue: \Tests\Models\UsersTest::testTryGetFromExternalSource<br />PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known | |||
|- | |||
| cachet || High || App doesn't work | |||
|- | |||
| devcentral || High || App doesn't work | |||
|- | |||
| etherpad || High || Container doesn't start | |||
|- | |||
| wolfphab || High || App doesn't work | |||
|} | |||
=== When a container doesn't want to restart === | === When a container doesn't want to restart === |
Revision as of 23:04, 22 July 2016
Restart
A lot of vital components are managed by Docker.
Ideally, we should have redundancy to avoid container lost.
- Restart Docker engine
- Restart containers, either manually, either with the docker-containers systemd unit
- Run production tests
Services needing manual tweaking after container restart
Phabricator instances
DevCentral needs:
- bin/phd status
- bin/phd stop <the PID of PhabricatorBot>
- sv restart phd
- chpst -u app bin/phd launch PhabricatorBot /opt/phabricator/conf/xessife.json
Other instances needs a sv restart phd
.
Broker
Wearg needs to be manually reconnected to the broker:
- .tcl mq disconnect
- .tcl utimers and kill remaining timers if needed
- .tcl mq broker::connect
- .tcl broker::on_tick
That requires a owner access to Wearg (ping Dereckson).
Troubleshooting
When MySQL isn't reachable
If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <correct IP> mysql
line.
What containers need MySQL and symptoms when not reachable?
Container | MySQL priority | Symptom when not running |
---|---|---|
silly_bardeen | Needed for some CI tasks | Jenkins jobs test-auth-grove-* will fail: PHPUnit test issue: \Tests\Models\UsersTest::testTryGetFromExternalSource PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known |
cachet | High | App doesn't work |
devcentral | High | App doesn't work |
etherpad | High | Container doesn't start |
wolfphab | High | App doesn't work |
When a container doesn't want to restart
Try first to see what happens with docker logs <container name>.
If that doesn't work, go to the services section of the grimoire and reprovision it.
Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation.