Operations grimoire/Eglide/Vault

From Nasqueron Agora

Vault on the shellserver role is installed through HashiCorp repository package.

States are located in rOPS: roles/shellserver/vault unit. This unit is needed as Eglide isn't connected to our private network and so doesn't have access to Complector directly.

Configuration

Configuration is stored in /etc/vault.hcl as it's a Debian machine, not in /usr/local/etc

The package generate auto-signing keys and a out-of-the-box configuration in /etc/vault.d that needs to be removed by our unit, if the package is reinstalled, there is a risk those files are respawned again.

Also, the package populates a systemd service to read /etc/vault.d/vault.hcl so it also needs to be replaced by our one, or by overrides.

In any case, run salt-call --local state.apply roles/shellserver/vault/config to restore a correct configuration state.

Certificates

Vault certificates should be generated in /etc/certificates/vault

If we use the Nasqueron Vault CA for this, Vault client should use certificate from /usr/local/share/ca-certificates/nasqueron-vault-ca.crt like on any other server. The certificates_update_store state in rOPS: roles/core/certificates includes that certificate in /etc/ssl/certs as debian:nasqueron-vault-ca.pem.

Vault server wants two files to do TLS termination:

  • /etc/certificates/vault/private.key
  • /etc/certificates/vault/fullchain.pem

From Operations grimoire/Vault we can generate those elements from Complector Vault (working on Complector or WindRiver).

The certificate common name MUST be a subdomain of *.nasqueron.drake, so we use <machine name>.eglide.nasqueron.drake:

   vault write -format=json pki_vault/issue/nasqueron-drake common_name=zonegrey.eglide.nasqueron.drake ttl=2160h ip_sans=127.0.0.1,10.197.126.53

The output needs to be dispatched in several files:

  • .data.certificate to certificate.pem
  • .data.issuing_ca to ca.pem (that will be used for the fullchain)
  • .data.private_key to private.key (careful how you replace the \n, if you use Python REPL, do it on Complector and get rid of the history with import readline ; readline.clear_history()) -> chmod 400
  • the fullchan bundle can be created with cat certificate.pem ca.pem > fullchain.pem

You can then restart Vault with systemctl restart vault.

Secrets K/V store

On Eglide, key value secrets are store in kv2 engine mounted under default path kv/

Secrets can be organised by service or user account:

  • kv/service/ contains secrets sorted by services deployed through Salt or automated
  • kv/user/ contains secrets by user accounts, for general purpose and unsalted services

For example, Odderon services live in kv/service/odderon

Basic operations are:

  • Store a secret: vault kv put kv/service/acme/nickserv password=$(openssl rand -base64 18)
  • Read a secret: vault kv get kv/service/acme/nickserv

Secrets can be read:

  • for kv/user/<account>/* by a token issued for that user, on request
  • for kv/service/* by Salt to deploy services, by default
  • for kv/service/<service>/* to a specific service with Vault support through AppRole authentication, on request

Troubleshoot

DRP :: Reconfigure Salt

Prepare a policy for Salt:

   vault policy write salt-node /srv/salt/roles/shellserver/vault/files/salt.hcl

Salt needs an approle auth method.

   vault auth enable approle
   vault write auth/approle/role/salt-node secret_id_ttl=10m \
       token_policies="jenkins" \
       token_num_uses=10 \
       token_ttl=20m \
       token_max_ttl=30m \
       secret_id_num_uses=0
   vault read auth/approle/role/salt-node/role-id
   vault write -f auth/approle/role/salt-node/secret-id

The role_id and secret_id values belong to /etc/salt/minion.d/vault.conf

Tip: you can test connection with sudo _tests/config/salt-primary/vault.py if you replace /usr/local/etc by /etc in it. You should then get the following content:

   Can connect to Vault:
       metadata: {'role_name': 'salt-node'}
       policies: ['default', 'salt-node']
       token_policies: ['default', 'salt-node']
       token_type: service

Certificates

Certificate MUST has a 127.0.0.1 IP SAN:

   Error checking seal status: Get "https://127.0.0.1:8200/v1/sys/seal-status": tls: failed to verify certificate: x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs

That could simply by the symptom Vault uses the self-signed certificate in /etc/vault.d. Delete that directory by running againt Salt states (sudo salt-call --local roles/shellserver/vault/config).