Operations grimoire/Restart a Docker engine: Difference between revisions

From Nasqueron Agora
Line 30: Line 30:


== Troubleshooting ==
== Troubleshooting ==
=== When MySQL isn't reacheable ===
=== When MySQL isn't reachable ===
If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <code><correct IP> mysql</code> line.
If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <code><correct IP> mysql</code> line.
=== What containers need MySQL and symptoms when not reachable? ===
{| class="wikitable sortable"
|-
! Container !! MySQL priority !! Symptom when not running
|-
| silly_bardeen || Needed for some CI tasks || Jenkins jobs '''test-auth-grove-*''' will fail:<br />PHPUnit test issue:  \Tests\Models\UsersTest::testTryGetFromExternalSource<br />PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
|-
| cachet || High || App doesn't work
|-
| devcentral || High || App doesn't work
|-
| etherpad || High || Container doesn't start
|-
| wolfphab || High || App doesn't work
|}


=== When a container doesn't want to restart ===
=== When a container doesn't want to restart ===

Revision as of 23:04, 22 July 2016

Restart

A lot of vital components are managed by Docker.

Ideally, we should have redundancy to avoid container lost.

  1. Restart Docker engine
  2. Restart containers, either manually, either with the docker-containers systemd unit
  3. Run production tests

Services needing manual tweaking after container restart

Phabricator instances

DevCentral needs:

  1. bin/phd status
  2. bin/phd stop <the PID of PhabricatorBot>
  3. sv restart phd
  4. chpst -u app bin/phd launch PhabricatorBot /opt/phabricator/conf/xessife.json

Other instances needs a sv restart phd.

Broker

Wearg needs to be manually reconnected to the broker:

  1. .tcl mq disconnect
  2. .tcl utimers and kill remaining timers if needed
  3. .tcl mq broker::connect
  4. .tcl broker::on_tick

That requires a owner access to Wearg (ping Dereckson).

Troubleshooting

When MySQL isn't reachable

If the MySQL container (acquisitariat) IP changed, you need to tweak /etc/hosts in every depending container (Phabricator instances, cachet, pad, login) and ensure there is a <correct IP> mysql line.

What containers need MySQL and symptoms when not reachable?

Container MySQL priority Symptom when not running
silly_bardeen Needed for some CI tasks Jenkins jobs test-auth-grove-* will fail:
PHPUnit test issue: \Tests\Models\UsersTest::testTryGetFromExternalSource
PDOException: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
cachet High App doesn't work
devcentral High App doesn't work
etherpad High Container doesn't start
wolfphab High App doesn't work

When a container doesn't want to restart

Try first to see what happens with docker logs <container name>.

If that doesn't work, go to the services section of the grimoire and reprovision it.

Commit a backup Docker image based on the current content with with docker commit <name> <name>-bak. That will allow investigation.