Operations grimoire/TLS certificates

From Nasqueron Agora
Revision as of 22:25, 20 October 2024 by Dereckson (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

TLS certificates should be used for every service we provide, so we can encrypt the communications.

Let's Encrypt

Certbot commands

acme-v02 migration. If you've a complaint acme-v01.api. isn't available, add --server https://acme-v02.api.letsencrypt.org/directory.

T1505 alignment. We don't use anymore Certbot as a Docker container on paas-docker role. As such, all paths are now unified. Symlinks have been added to old /srv paths to help transition.

Generate a certificate

certbot certonly -a webroot --webroot-path=/var/letsencrypt-auto --deploy-hook "service nginx reload" -d foo.nasqueron.org

certbot certonly -a webroot --webroot-path=/var/letsencrypt-auto --deploy-hook "systemctl reload nginx" -d foo.nasqueron.org

Generate a certificate for several sites

-d foo.nasqueron.org -d bar.nasqueron.org

If a certificate for foo already existed, it will offer to extend it to a new alternative name, which is probably a good idea.

Generate a certificate through DNS

DNS can be used to generate certificates for domains. For example, the Openfire XMPP certificate is generated like this:

   certbot certonly --server https://acme-v02.api.letsencrypt.org/directory --manual --manual-auth-hook /etc/letsencrypt/acme-dns-auth --preferred-challenges dns --debug-challenges -d xmpp.nasqueron.org -d nasqueron.org -d conference.nasqueron.org

From December 2023, you can use this command on FreeBSD servers where T1505 Let's Encrypt standard installation have been deployed:

   certbot certonly --manual --manual-auth-hook /usr/local/etc/letsencrypt/acme-dns-auth --preferred-challenges dns --debug-challenges -d subdomain.nasqueron.org

This uses a specialized DNS server deployed on our Docker PaaS to serve dynamic TLS records under .acme.nasqueron.org, rOPS: roles/paas-docker/containers/acme_dns.sls.

Support files were previously only deployed to paas-docker role through rOPS: roles/paas-docker/letsencrypt/files.

Renew all certificates

certbot renew

Installation on nginx

Allow Let's encrypt validation
   include includes/letsencrypt;

This will use rOPS: roles/webserver-core/nginx/files/includes/letsencrypt nginx configuration.

Serve TLS certificate
   include includes/tls;
   ssl_certificate /srv/letsencrypt/etc/live/xmpp.nasqueron.org/fullchain.pem;
   ssl_certificate_key /srv/letsencrypt/etc/live/xmpp.nasqueron.org/privkey.pem;

This will configure a compromise between security and compatibility, based on Intermediate Mozilla SSL config[1]. The current configuration is served by rOPS: roles/webserver-core/nginx/files/includes/tls.

If you prefer to restrict the resource for TLS 1.3, and accepts to block legacy clients, you can also use with D3251:

   include includes/tls-modern-only;
   ssl_certificate /srv/letsencrypt/etc/live/xmpp.nasqueron.org/fullchain.pem;
   ssl_certificate_key /srv/letsencrypt/etc/live/xmpp.nasqueron.org/privkey.pem;

Edit renewal hook

In /etc/letsencrypt/renewal or /usr/local/etc/letsencrypt/renewal you can edit this section to customize the command to run after renewal:

   [renewalparams]
   renew_hook = systemctl reload nginx

That can be applied in batch when nothing is set: gsed -i "s/\[\[webroot_map]]/renew_hook = systemctl reload nginx\n\n[[webroot_map]]/g" *.conf

Install certbot on CentOS 10 stream

Packages for Python 3.12 currently used as of 2024-08-04 on CentOS 10 stream can be fetched from Fedora 40 repository.

Installation on Dwellers was done like this:

   mkdir /tmp/certbot && cd /tmp/certbot
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/c/certbot-2.9.0-1.fc40.noarch.rpm
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/p/python3-acme-2.9.0-1.fc40.noarch.rpm
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/p/python3-certbot-2.9.0-1.fc40.noarch.rpm
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/p/python3-configargparse-1.7-3.fc40.noarch.rpm
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/p/python3-josepy-1.13.0-8.fc40.noarch.rpm
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/p/python3-parsedatetime-2.6-12.fc40.noarch.rpm
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/p/python3-pyOpenSSL-23.2.0-3.fc40.noarch.rpm
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/p/python3-pyrfc3339-1.1-18.fc40.noarch.rpm
   wget https://dl.fedoraproject.org/pub/fedora/linux/releases/40/Everything/x86_64/os/Packages/p/python3-pytz-2024.1-1.fc40.noarch.rpm
   dnf install ./*.rpm

Nasqueron PKI

Principles

Internal PKI material are stored in Vault, see Operations grimoire/Vault for information about how to generate, renew, etc.

In the operations repository, the rOPS: roles/core/certificates unit is responsible to deploy nasqueron-vault-ca.crt, used to authenticate for *.nasqueron.drake or IP services. For exemple, Vault clients require a certificate they can validate with a chain from that intermediate CA. Note as we don't currently use the root CA, a fullchain is probably not needed for those services.

Rollover a new certificate

If nasqueron-vault-ca.crt is updated or if we want instead to provision the root certificate, the following repository need to be updated and services to be redeployed:

Where to update certificate?
Repository Path Purpose Deployment instructions
operations rOPS: roles/core/certificates/files/ Trust internal resources on every server Salt: salt '*' state.apply roles/core/certificates
docker-airflow files/ Connect to Vault from Airflow Python code Docker: build and deploy new image nasqueron/airflow on Dwellers

Deployment instructions are up-to-date when newly added on that table. Afterwards, they should be checked against current procedures.

Special considerations

New server

Let's encrypt client is available on Ysul (natively) and Dwellers (as a wrapper script for a Docker container).

Fill a task in Servers component, subscribe Sandlayth and Dereckson to deploy it on a new server.

A salt state would be nice for such purpose. There is work-in-progress on that matter through D3248.

Internationalized domain names

Punycode conversion

Both for web server configuration and certificate authority, name must be converted to Punycode (RFC 3492): https://www.punycoder.com/

Let's encrypt support

Let's encrypt has supported IDN since 2016[2]. We use it for dægrefn.nasqueron.org certificate.

Previously, they were afraid: attackers could register a domain with a Cyrillic character matching a real domains. As some people consider it's the responsibility of the CA to mitigate such risks, the feature has been several times postponed.

StartSSL

StartSSL is not in activity anymore. It was used at Nasqueron when Let's Encrypt didn't support IDN.

Troubleshoot

Acme DNS

Can't reach ACME DNS API / HTTP Error 403: Forbidden

AcmeDNS API access is restricted to IPs addresses set in rOPS: roles/webserver-core/nginx/files/includes/geo_nasqueron.

If you got a 403 by running acme.sh (only visible in --debug mode):

  • If the IP address is missing from geo_nasqueron, add it
  • If the IP address is already there, check on the server if /etc/nginx/includes/geo_nasqueron matches
    • Yes -> nginx must be reloaded (nginx -t reload)
    • No -> Deploy with salt docker-002 state.sls_id /etc/nginx/includes/geo_nasqueron roles/webserver-core/nginx

Another solution is a specific routing to reach the API: For Hervil, add a route to use router-001 gateway: route add 51.255.124.8/30 172.27.27.1

Troubleshoot records in ACME DNS database

To check the acme.nasqueron.org subdomain matching your _acme-challenge subdomain: nslookup -type=TXT _acme-challenge.<subdomain>.nasqueron.org

The DNS server to store acme.nasqueron.org records uses a sqlite3 database. To connect and check your parameters:

   $ ssh docker-002
   $ sqlite3 /srv/acme/lib/acme-dns.db
   SELECT Username, Password FROM records WHERE Subdomain = "<DNS answer>"

Credential lost for a ACME DNS subdomain

Passwords are hashed with bcrypt, so it's not possible to recover them.

In that case, there are two options:

  • if you prefer edit the DNS record, delete acme.sh information for the domain renewal then issue a new certificate, that will register a new account
  • if you prefer edit the ACME DNS database, encrypt a new passw@ord in bcrypt and update it with sqlite

Certificates requests are limited by Let's Encrypt, updating the password avoids to reach that quota.

You can generate a uuid and pass it to bcrypt, or register a new password (and update DNS):

   $ ssh docker-002
   $ sqlite3 /srv/acme/. lib/acme-dns.db
   UPDATE records SET Password = "<new bcrypt hash>" WHERE Subdomain = "<DNS answer>"

To get password hash with bcrypt, you can use psysh on WindRiver:

  $ psysh
  password_hash("alpha", PASSWORD_BCRYPT)
  $2y$10$N17lnt09YFIoVY4025AmCuGrlmA.ZiThm.RDufcakGq9.3IK4xVcW

Replace $2y$ by $2a$ as x and y are unique for PHP, the rest of the world uses $2a$.

Notes & references