Operations grimoire/DevCentral: Difference between revisions

From Nasqueron Agora
(→‎Upgrade: libphutil -> arcanist)
Line 3: Line 3:
== Security ==
== Security ==
=== CI access ===
=== CI access ===
Jenkins can deploy code. Only trusted users should get access to Jenkins.
Our Jenkins infrastructure uses two Jenkins server:
* Jenkins CI, a more open one for continuous integration purpose
* Jenkins CD, to deploy code from the main branch, with only trusted users access to run jobs


Jenkinsfile script can arbitrarily request the node tag they want, so specify secure and non secure nodes isn't enough since the Jenkins 2.0 migration.
Jenkinsfile script can arbitrarily request the node tag they want, so specify secure and non secure nodes isn't enough since the Jenkins 2.0 migration. Thence the Jenkins CD creation, isolated from Jenkins CI.


To open more our CI, we probably need two Jenkins master, one with access only its own group of non trusted nodes, isolated from the deploy Jenkins.
The access is still secured from the era we only had one Jenkins instance:
 
To secure the access to CI:
* we maintain a group [https://devcentral.nasqueron.org/project/members/50/ Trusted users] with users trusted not to send malicious code to CI
* we maintain a group [https://devcentral.nasqueron.org/project/members/50/ Trusted users] with users trusted not to send malicious code to CI
* Herald rules triggering job should check ''Author's projects include all of Trusted users''
* Herald rules triggering job should check ''Author's projects include all of Trusted users''
* When creating a new build plan, it should only be visible to trusted users (it exposes a Jenkins token to trigger a new job)
* When creating a new build plan, it should only be visible to trusted users (it exposes a Jenkins token to trigger a new job)
We should consider to open more this, for example is Jenkins CI safe enough to exposes trigger new job links to anonymous access? If so, can we protect it against DoS?


== Upgrade ==
== Upgrade ==

Revision as of 11:30, 20 February 2022

DevCentral is the name of our Phabricator instance.

Security

CI access

Our Jenkins infrastructure uses two Jenkins server:

  • Jenkins CI, a more open one for continuous integration purpose
  • Jenkins CD, to deploy code from the main branch, with only trusted users access to run jobs

Jenkinsfile script can arbitrarily request the node tag they want, so specify secure and non secure nodes isn't enough since the Jenkins 2.0 migration. Thence the Jenkins CD creation, isolated from Jenkins CI.

The access is still secured from the era we only had one Jenkins instance:

  • we maintain a group Trusted users with users trusted not to send malicious code to CI
  • Herald rules triggering job should check Author's projects include all of Trusted users
  • When creating a new build plan, it should only be visible to trusted users (it exposes a Jenkins token to trigger a new job)

We should consider to open more this, for example is Jenkins CI safe enough to exposes trigger new job links to anonymous access? If so, can we protect it against DoS?

Upgrade

First, upgrade the source code, pulling master for libphutil and rebasing production against origin/master for phabricator:

$ cd /opt/arcanist
$ git pull
$ cd /opt/phabricator
$ git fetch
$ git rebase origin/master

If there is any issue, solve the merge conflict for the production branch. Don't panic, all is compiled through opcache, you can safely and calmly solve it without impacting the website.

Now stop the services, apply database migrations and restart services:

$ bin/phd status
$ bin/phd stop
$ chpst -u app bin/storage upgrade -f

Done.                                                   
Completed applying all schema adjustments.
$ sv restart php-fpm
ok: run: php-fpm: (pid 8297) 0s
$ sv restart phd
ok: run: phd: (pid 8415) 0s

Check https://devcentral.nasqueron.org/ for installation issues.

If you aren't administrator, ask an admin to check (a yellow balloon popup appears with instructions where there is something to do).

Troubleshoot

SSH servers aren't running

We need two instances of sshd (none with default configuration file):

  • hosting: /usr/sbin/sshd -f /etc/ssh/sshd_phabricator_hosting_config (container port 22, world port 5022)
  • staging: /usr/sbin/sshd -f /etc/ssh/sshd_staging_config (port 7022)

devcentral.nasqueron.org port 5022: Connection refused

$ git push ssh: connect to host devcentral.nasqueron.org port 5022: Connection refused fatal: Could not read from remote repository.

That requires two things:

  • a SSH server launched on the port 22 of the devcentral Docker container, to serve repositories (not a staging area): http://pad.wolfplex.be/p/DevCentral
  • an iptables rule to forward ports: iptables -t nat -I PREROUTING -i ens192 -p TCP -d 51.255.124.10/32 --dport 5022 -j DNAT --to-destination 172.17.0.23:22
  • if the IP changed, check with iptables -t nat -L PREROUTING an old entry (5022 is "mice"):
    • To remove the old: iptables -t nat -D PREROUTING -i ens192 -p TCP -d 51.255.124.10/32 --dport 5022 -j DNAT --to-destination 172.17.0.139:22
    • To add the new: iptables -t nat -I PREROUTING -i ens192 -p TCP -d 51.255.124.10/32 --dport 5022 -j DNAT --to-destination 172.17.0.23:22
    • To check the rules: iptables -t nat -L PREROUTING | grep mice

2018-11-10: current IP has been moved to 172.17.0.23

Database flood

Some tables can be heavily flooded:

  • devcentral_daemon.daemon_logevent — 7M lines
  • devcentral_conduit.conduit_methodcalllog — 23M lines

Just truncate them.

A recent SSH key change doesn't allow VCS access

When an user SSH key is updated in settings, the authentication script uses a cached version of the information.

To troubleshoot that, ask user to use `ssh -p 5022 vcs@devcentral.nasqueron.org`. If a permission denied is returned and keys are the same locally and on devcentral, check on the container:

$ docker exec -it devcentral bash
$ scripts/ssh-auth | grep <username>

If the key isn't there, purge the cache:

$ bin/cache purge --all

If we know the right option (could be general or user), it would be a good idea to use it to be avoid to destroy everything and slow down the site.

It's not needed to restart phd or php-fpm (and it won't affect anything if you do, that's not an OPCache service, but an user data cache).