Operations grimoire/Restart a Docker engine: Difference between revisions

From Nasqueron Agora
(2 intermediate revisions by the same user not shown)
Line 39: Line 39:
! Container !! MySQL priority !! Symptom when not running
! Container !! MySQL priority !! Symptom when not running
|-
|-
| silly_bardeen || Needed for some CI tasks || Jenkins jobs '''test-auth-grove-*''' will fail:<br />PHPUnit test issue:  \Tests\Models\UsersTest::testTryGetFromExternalSource<br />PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
| silly_bardeen || Needed for some CI tasks || Jenkins jobs '''test-auth-grove-*''' will fail:
 
PHPUnit test issue:  \Tests\Models\UsersTest::testTryGetFromExternalSource<br />
PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
 
To test if all works:
<pre>
$ su app
$ cd ~/workspace/test-auth-grove-php
$ phpunit --no-coverage
OK (37 tests, 58 assertions)
</pre>
|-
|-
| cachet || High || App doesn't work
| cachet || High || App doesn't work
Line 56: Line 67:


Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation.
Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation.
=== Error response from daemon: Unknown runtime specified oci ===
The container doesn't use the current runtime.
It's currently <code>docker-runc</code> but was previously for a short time <code>oci</code>.
You can, change the runtime in the container configuration following instructions from [https://umbra.xyz/fix-docker-unknown-runtime-specified-oci-error/ this post].
Do that only for containers you don't want to respin (e.g. devcentral). Others you can safely rm and respin instead.

Revision as of 03:03, 14 February 2017

Restart

A lot of vital components are managed by Docker.

Ideally, we should have redundancy to avoid container lost.

  1. Restart Docker engine
  2. Restart containers, either manually, either with the docker-containers systemd unit
  3. Run production tests

Services needing manual tweaking after container restart

Phabricator instances

DevCentral needs:

  1. bin/phd status
  2. bin/phd stop <the PID of PhabricatorBot>
  3. sv restart phd
  4. chpst -u app bin/phd launch PhabricatorBot /opt/phabricator/conf/xessife.json

Other instances needs a sv restart phd.

Broker

Wearg needs to be manually reconnected to the broker:

  1. .tcl mq disconnect
  2. .tcl utimers and kill remaining timers if needed
  3. .tcl mq broker::connect
  4. .tcl broker::on_tick

That requires a owner access to Wearg (ping Dereckson).

Troubleshooting

When MySQL isn't reachable

If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <correct IP> mysql line.

What containers need MySQL and symptoms when not reachable?

Container MySQL priority Symptom when not running
silly_bardeen Needed for some CI tasks Jenkins jobs test-auth-grove-* will fail:

PHPUnit test issue: \Tests\Models\UsersTest::testTryGetFromExternalSource
PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known

To test if all works:

$ su app
$ cd ~/workspace/test-auth-grove-php
$ phpunit --no-coverage
OK (37 tests, 58 assertions)
cachet High App doesn't work
devcentral High App doesn't work
etherpad High Container doesn't start
wolfphab High App doesn't work

When a container doesn't want to restart

Try first to see what happens with docker logs <container name>.

If that doesn't work, go to the services section of the grimoire and reprovision it.

Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation.

Error response from daemon: Unknown runtime specified oci

The container doesn't use the current runtime.

It's currently docker-runc but was previously for a short time oci.

You can, change the runtime in the container configuration following instructions from this post.

Do that only for containers you don't want to respin (e.g. devcentral). Others you can safely rm and respin instead.