Operations grimoire/Restart a Docker engine: Difference between revisions
No edit summary |
No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''This containers historical notes from 2016. This page reflect only partially the current state of our Docker engines. Yet, the broker/Wearg thing is for example still current in 2023.''' | |||
== Restart == | == Restart == | ||
A lot of vital components are managed by Docker. | A lot of vital components are managed by Docker. | ||
Line 6: | Line 8: | ||
# Restart Docker engine | # Restart Docker engine | ||
# Restart containers, either manually, either with the docker-containers systemd unit | # Restart containers, either manually, either with the docker-containers systemd unit | ||
# Run [[Operations grimoire/ | # Run [[Operations grimoire/Production tests|production tests]] | ||
== Services needing manual tweaking after container restart == | == Services needing manual tweaking after container restart == | ||
=== Phabricator instances === | === Phabricator instances === | ||
DevCentral | If DevCentral complains SSH hosting or daemons isn't available: | ||
# | # sv status sshd-hosting (if needed sv start sshd-hosting) | ||
# sv restart phd | # sv restart phd | ||
Other instances | Other instances need only a <code>sv restart phd</code>. | ||
=== Broker === | === Broker === | ||
Line 30: | Line 30: | ||
== Troubleshooting == | == Troubleshooting == | ||
=== When MySQL isn't | === When MySQL isn't reachable === | ||
If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <code><correct IP> mysql</code> line. | If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <code><correct IP> mysql</code> line. | ||
=== What containers need MySQL and symptoms when not reachable? === | |||
{| class="wikitable sortable" | |||
|- | |||
! Container !! MySQL priority !! Symptom when not running | |||
|- | |||
| silly_bardeen || Needed for some CI tasks || Jenkins jobs '''test-auth-grove-*''' will fail: | |||
PHPUnit test issue: \Tests\Models\UsersTest::testTryGetFromExternalSource<br /> | |||
PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known | |||
To test if all works: | |||
<pre> | |||
$ su app | |||
$ cd ~/workspace/test-auth-grove-php | |||
$ phpunit --no-coverage | |||
OK (37 tests, 58 assertions) | |||
</pre> | |||
|- | |||
| cachet || High || App doesn't work | |||
|- | |||
| devcentral || High || App doesn't work | |||
|- | |||
| etherpad || High || Container doesn't start | |||
|- | |||
| wolfphab || High || App doesn't work | |||
|} | |||
=== When a container doesn't want to restart === | === When a container doesn't want to restart === | ||
Line 39: | Line 67: | ||
Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation. | Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation. | ||
=== Error response from daemon: Unknown runtime specified oci === | |||
The container doesn't use the current runtime. | |||
It's currently <code>docker-runc</code> but was previously for a short time <code>oci</code>. | |||
You can, change the runtime in the container configuration following instructions from [https://umbra.xyz/fix-docker-unknown-runtime-specified-oci-error/ this post]. | |||
Do that only for containers you don't want to respin (e.g. devcentral). Others you can safely rm and respin instead. |
Latest revision as of 21:40, 12 May 2023
This containers historical notes from 2016. This page reflect only partially the current state of our Docker engines. Yet, the broker/Wearg thing is for example still current in 2023.
Restart
A lot of vital components are managed by Docker.
Ideally, we should have redundancy to avoid container lost.
- Restart Docker engine
- Restart containers, either manually, either with the docker-containers systemd unit
- Run production tests
Services needing manual tweaking after container restart
Phabricator instances
If DevCentral complains SSH hosting or daemons isn't available:
- sv status sshd-hosting (if needed sv start sshd-hosting)
- sv restart phd
Other instances need only a sv restart phd
.
Broker
Wearg needs to be manually reconnected to the broker:
- .tcl mq disconnect
- .tcl utimers and kill remaining timers if needed
- .tcl mq broker::connect
- .tcl broker::on_tick
That requires a owner access to Wearg (ping Dereckson).
Troubleshooting
When MySQL isn't reachable
If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <correct IP> mysql
line.
What containers need MySQL and symptoms when not reachable?
Container | MySQL priority | Symptom when not running |
---|---|---|
silly_bardeen | Needed for some CI tasks | Jenkins jobs test-auth-grove-* will fail:
PHPUnit test issue: \Tests\Models\UsersTest::testTryGetFromExternalSource To test if all works: $ su app $ cd ~/workspace/test-auth-grove-php $ phpunit --no-coverage OK (37 tests, 58 assertions) |
cachet | High | App doesn't work |
devcentral | High | App doesn't work |
etherpad | High | Container doesn't start |
wolfphab | High | App doesn't work |
When a container doesn't want to restart
Try first to see what happens with docker logs <container name>.
If that doesn't work, go to the services section of the grimoire and reprovision it.
Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation.
Error response from daemon: Unknown runtime specified oci
The container doesn't use the current runtime.
It's currently docker-runc
but was previously for a short time oci
.
You can, change the runtime in the container configuration following instructions from this post.
Do that only for containers you don't want to respin (e.g. devcentral). Others you can safely rm and respin instead.