Operations grimoire/DevCentral: Difference between revisions

From Nasqueron Agora
 
(13 intermediate revisions by the same user not shown)
Line 3: Line 3:
== Security ==
== Security ==
=== CI access ===
=== CI access ===
Jenkins can deploy code. Only trusted users should get access to Jenkins.
Our Jenkins infrastructure uses two Jenkins server:
* Jenkins CI, a more open one for continuous integration purpose
* Jenkins CD, to deploy code from the main branch, with only trusted users access to run jobs


Jenkinsfile script can arbitrarily request the node tag they want, so specify secure and non secure nodes isn't enough since the Jenkins 2.0 migration.
Jenkinsfile script can arbitrarily request the node tag they want, so specify secure and non secure nodes isn't enough since the Jenkins 2.0 migration. Thence the Jenkins CD creation, isolated from Jenkins CI.


To open more our CI, we probably need two Jenkins master, one with access only its own group of non trusted nodes, isolated from the deploy Jenkins.
The access is still secured from the era we only had one Jenkins instance:
 
To secure the access to CI:
* we maintain a group [https://devcentral.nasqueron.org/project/members/50/ Trusted users] with users trusted not to send malicious code to CI
* we maintain a group [https://devcentral.nasqueron.org/project/members/50/ Trusted users] with users trusted not to send malicious code to CI
* Herald rules triggering job should check ''Author's projects include all of Trusted users''
* Herald rules triggering job should check ''Author's projects include all of Trusted users''
* When creating a new build plan, it should only be visible to trusted users (it exposes a Jenkins token to trigger a new job)
* When creating a new build plan, it should only be visible to trusted users (it exposes a Jenkins token to trigger a new job)
We should consider to open more this, for example is Jenkins CI safe enough to exposes trigger new job links to anonymous access? If so, can we protect it against DoS?


== Upgrade ==
== Upgrade ==


First, upgrade the source code, pulling master for libphutil and rebasing production against origin/master for phabricator:
First, upgrade the source code, pulling master for libphutil and rebasing production against origin/master for phabricator:
<source lang="console">
<syntaxhighlight lang="console">
$ cd /opt/libphutil
$ cd /opt/arcanist
$ git pull
$ git pull
$ cd /opt/phabricator
$ cd /opt/phabricator
$ git fetch
$ git fetch
$ git rebase origin/master
$ git rebase origin/master
</source>
</syntaxhighlight>


If there is any issue, solve the merge conflict for the production branch. Don't panic, all is compiled through opcache, you can safely and calmly solve it without impacting the website.
If there is any issue, solve the merge conflict for the production branch. Don't panic, all is compiled through opcache, you can safely and calmly solve it without impacting the website.
Line 29: Line 31:
Now stop the services, apply database migrations and restart services:
Now stop the services, apply database migrations and restart services:


<source lang="console">
<syntaxhighlight lang="console">
$ bin/phd status
$ bin/phd status
$ bin/phd stop
$ bin/phd stop
$ bin/storage upgrade -f
$ chpst -u app bin/storage upgrade -f
Done.                                                   
Done.                                                   
Line 40: Line 42:
$ sv restart phd
$ sv restart phd
ok: run: phd: (pid 8415) 0s
ok: run: phd: (pid 8415) 0s
</source>
</syntaxhighlight>


Check https://devcentral.nasqueron.org/ for installation issues.
Check https://devcentral.nasqueron.org/ for installation issues.
Line 46: Line 48:
If you aren't administrator, ask an admin to check (a yellow balloon popup appears with instructions where there is something to do).
If you aren't administrator, ask an admin to check (a yellow balloon popup appears with instructions where there is something to do).


== Troubleshoot ==
== Troubleshoot ==
=== SSH servers aren't running ===
=== SSH servers aren't running ===
We need two instances of sshd (none with default configuration file):
We need two instances of sshd (none with default configuration file):
Line 53: Line 55:


=== devcentral.nasqueron.org port 5022: Connection refused ===
=== devcentral.nasqueron.org port 5022: Connection refused ===
<code>
<syntaxhighlight lang="console">
$ git push
$ git push
ssh: connect to host devcentral.nasqueron.org port 5022: Connection refused
ssh: connect to host devcentral.nasqueron.org port 5022: Connection refused
fatal: Could not read from remote repository.
fatal: Could not read from remote repository.
</code>
</syntaxhighlight>


That requires two things:
That requires two things:
Line 71: Line 73:
=== Database flood ===
=== Database flood ===
Some tables can be heavily flooded:
Some tables can be heavily flooded:
* phabricator_daemon.daemon_logevent — 7M lines
* devcentral_daemon.daemon_logevent — 7M lines
* phabricator_conduit.conduit_methodcalllog — 23M lines
* devcentral_conduit.conduit_methodcalllog — 23M lines


Just truncate them.
Just truncate them.
=== A recent SSH key change doesn't allow VCS access ===
When an user SSH key is updated in settings, the authentication script uses a cached version of the information.
To troubleshoot that, ask user to use `ssh -p 5022 vcs@devcentral.nasqueron.org`. If a permission denied is returned and keys are the same locally and on devcentral, check on the container:
<syntaxhighlight lang="console">
$ docker exec -it devcentral bash
$ scripts/ssh/ssh-auth.php | grep <username>
</syntaxhighlight>
If the key isn't there, purge the ''general'' cache:
<syntaxhighlight lang="console">
$ bin/cache purge --caches general
</syntaxhighlight>
It's not needed to restart phd or php-fpm (and it won't affect anything if you do, that's not an OPCache service, but an user data cache).
=== Login support ===
To get the list of accounts and the way to authenticate (e.g. Google, GitHub), you can execute this query:
    $ bin/storage shell
    \u devcentral_user
    SELECT u.userName, e.accountType FROM user u, user_externalaccount e WHERE e.userPHID = u.phid ORDER BY u.userName ASC;

Latest revision as of 17:29, 5 September 2024

DevCentral is the name of our Phabricator instance.

Security

CI access

Our Jenkins infrastructure uses two Jenkins server:

  • Jenkins CI, a more open one for continuous integration purpose
  • Jenkins CD, to deploy code from the main branch, with only trusted users access to run jobs

Jenkinsfile script can arbitrarily request the node tag they want, so specify secure and non secure nodes isn't enough since the Jenkins 2.0 migration. Thence the Jenkins CD creation, isolated from Jenkins CI.

The access is still secured from the era we only had one Jenkins instance:

  • we maintain a group Trusted users with users trusted not to send malicious code to CI
  • Herald rules triggering job should check Author's projects include all of Trusted users
  • When creating a new build plan, it should only be visible to trusted users (it exposes a Jenkins token to trigger a new job)

We should consider to open more this, for example is Jenkins CI safe enough to exposes trigger new job links to anonymous access? If so, can we protect it against DoS?

Upgrade

First, upgrade the source code, pulling master for libphutil and rebasing production against origin/master for phabricator:

$ cd /opt/arcanist
$ git pull
$ cd /opt/phabricator
$ git fetch
$ git rebase origin/master

If there is any issue, solve the merge conflict for the production branch. Don't panic, all is compiled through opcache, you can safely and calmly solve it without impacting the website.

Now stop the services, apply database migrations and restart services:

$ bin/phd status
$ bin/phd stop
$ chpst -u app bin/storage upgrade -f

Done.                                                   
Completed applying all schema adjustments.
$ sv restart php-fpm
ok: run: php-fpm: (pid 8297) 0s
$ sv restart phd
ok: run: phd: (pid 8415) 0s

Check https://devcentral.nasqueron.org/ for installation issues.

If you aren't administrator, ask an admin to check (a yellow balloon popup appears with instructions where there is something to do).

Troubleshoot

SSH servers aren't running

We need two instances of sshd (none with default configuration file):

  • hosting: /usr/sbin/sshd -f /etc/ssh/sshd_phabricator_hosting_config (container port 22, world port 5022)
  • staging: /usr/sbin/sshd -f /etc/ssh/sshd_staging_config (port 7022)

devcentral.nasqueron.org port 5022: Connection refused

$ git push
ssh: connect to host devcentral.nasqueron.org port 5022: Connection refused
fatal: Could not read from remote repository.

That requires two things:

  • a SSH server launched on the port 22 of the devcentral Docker container, to serve repositories (not a staging area): http://pad.wolfplex.be/p/DevCentral
  • an iptables rule to forward ports: iptables -t nat -I PREROUTING -i ens192 -p TCP -d 51.255.124.10/32 --dport 5022 -j DNAT --to-destination 172.17.0.23:22
  • if the IP changed, check with iptables -t nat -L PREROUTING an old entry (5022 is "mice"):
    • To remove the old: iptables -t nat -D PREROUTING -i ens192 -p TCP -d 51.255.124.10/32 --dport 5022 -j DNAT --to-destination 172.17.0.139:22
    • To add the new: iptables -t nat -I PREROUTING -i ens192 -p TCP -d 51.255.124.10/32 --dport 5022 -j DNAT --to-destination 172.17.0.23:22
    • To check the rules: iptables -t nat -L PREROUTING | grep mice

2018-11-10: current IP has been moved to 172.17.0.23

Database flood

Some tables can be heavily flooded:

  • devcentral_daemon.daemon_logevent — 7M lines
  • devcentral_conduit.conduit_methodcalllog — 23M lines

Just truncate them.

A recent SSH key change doesn't allow VCS access

When an user SSH key is updated in settings, the authentication script uses a cached version of the information.

To troubleshoot that, ask user to use `ssh -p 5022 vcs@devcentral.nasqueron.org`. If a permission denied is returned and keys are the same locally and on devcentral, check on the container:

$ docker exec -it devcentral bash
$ scripts/ssh/ssh-auth.php | grep <username>

If the key isn't there, purge the general cache:

$ bin/cache purge --caches general

It's not needed to restart phd or php-fpm (and it won't affect anything if you do, that's not an OPCache service, but an user data cache).

Login support

To get the list of accounts and the way to authenticate (e.g. Google, GitHub), you can execute this query:

   $ bin/storage shell
   \u devcentral_user
   SELECT u.userName, e.accountType FROM user u, user_externalaccount e WHERE e.userPHID = u.phid ORDER BY u.userName ASC;