Operations grimoire/Vault: Difference between revisions
(Use Vault with Curl, reorganize sections titles) |
No edit summary |
||
(25 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
''This page currently documents Vault installed on complector as part of {{Ops file|roles/vault/}}. It doesn't document Docker or shellserver installation. See [[Operations grimoire/Eglide/Vault]].'' | |||
Hashicorp Vault allows to manage credentials and is currently used: | |||
* to provision credentials through Salt | |||
* to allow applications to fetch credentials | |||
* to emit certificates for the private network applications | |||
* to store credentials used by the Nasqueron Operations SIG beings | |||
Vault policies are fully managed through Salt in the operations repository. | |||
Vault has been selected in 2016 for secrets management. | |||
== Vault reference == | == Vault reference == | ||
Line 35: | Line 46: | ||
Applications should use a policy restricting to a subpath for that app, not to the whole engine. | Applications should use a policy restricting to a subpath for that app, not to the whole engine. | ||
=== Users accounts === | |||
Members of the Operations SIG need an access to Vault to create and maintain credentials. | |||
Currently, we use tokens. To generate a token: | |||
vault token create -policy=admin | |||
To connect to Vault with a token from a devserver (e.g. WindRiver), you need: | |||
# Store the token in <code>$HOME/.vault-token</code> | |||
# Define environment variable VAULT_ADDR: <code>export VAULT_ADDR=https://172.27.27.7:8200</code> | |||
Tokens need to be frequently renew (before expiration) with <code>vault renew</code> | |||
== Salt configuration == | == Salt configuration == | ||
Line 68: | Line 94: | ||
=== Policies === | === Policies === | ||
Policies are managed in {{Ops file|pillar/credentials/vault.sls}}. | |||
For Salt, we deploy secrets to servers, policy management for a node consist to provide the list of secrets a role should have access to, generally in '''vault_secrets_by_role''' dictionary. | |||
For other applications: | |||
* create a policy .hcl in <code>roles/vault/policies/files/</code> | |||
* declare the policy in the pillar under <code>vault_policies</code> list, it should match the .hcl file (without the extension or the path) | |||
* run the repository tests to check if both information matches ; to only run that part, use <code>cd _tests && python3 -m unittest discover pillar/credentials/</code> | |||
To add only an application policy, Salt can be deployed like this: | |||
salt complector state.sls_id vault_policy_airflow roles/vault/policies | |||
=== DRP === | === DRP === | ||
Any step we need to do to configure the engines should be added to the {{ops file|roles/vault/bootstrap}} unit. | Any step we need to do to configure the engines should be added to the {{ops file|roles/vault/bootstrap}} unit. | ||
Line 74: | Line 114: | ||
* restore the storage, to keep data (credentials, certificates including the root CA one) | * restore the storage, to keep data (credentials, certificates including the root CA one) | ||
* recreate it from scratch through the roles/vault/bootstrap unit | * recreate it from scratch through the roles/vault/bootstrap unit | ||
== Howto == | |||
=== Allow a new application to authenticate to Vault === | |||
Applications need a policy and an authentication method, for example here for an application called yourservice with AppRole with secret_id and role_id: | |||
# Create a new policy in operations repository: {{D|3270}} | |||
# Deploy it from Complector with <code>salt complector state.sls_id vault_policy_yourservice roles/vault/policies</code> | |||
# Create an approle matching that policy: <code>vault write auth/approle/role/yourservice token_policies="yourservice" token_ttl=1h token_max_ttl=4h</code> | |||
# Get role_id with <code>vault read auth/approle/role/yourservice/role-id</code> | |||
# Get secret_id with <code>vault write -force auth/approle/role/yourservice/secret-id</code> | |||
# If you wish Salt to provision those credentials during your service deployment, probably a good idea to write them too: <code>vault kv put secrets/nasqueron/yourservice/vault username=role-id password=secret-id</code> [to skip if you don't use Salt to provision role-id and secret-id] | |||
=== Get an admin-level token === | |||
Ops can use the self-service hvac script to get a token valid for 30 days: | |||
ssh complector | |||
cd /opt/salt/nasqueron-operations | |||
git switch vault-self-service-token-policy | |||
sudo ./utils/vault/issue-admin-token.py | |||
Once [https://devcentral.nasqueron.org/D3355 D3355] has been merged, that would be usable from the main branch. | |||
== Troubleshoot == | == Troubleshoot == | ||
=== Connecting to Vault === | === Connecting to Vault === | ||
==== Access to web UI ==== | |||
Vault isn't listening any public IP, but accept connections from internal network, so you can open a SSH tunnel against <code>complector:8200</code> and then browse https://localhost:8200: | |||
<code>ssh -L 8200:complector:8200 router-001.nasqueron.org</code> | |||
==== curl: (60) SSL certificate problem ==== | ==== curl: (60) SSL certificate problem ==== | ||
Curl is compiled against a certificate bundle (<code>curl-config --ca</code> for the path). That bundle doesn't contain the pki_root certificate. We deploy this certificate in /etc/ssl/certs, but curl won't check there if not instructed to. | Curl is compiled against a certificate bundle (<code>curl-config --ca</code> for the path). That bundle doesn't contain the pki_root certificate. We deploy this certificate in /etc/ssl/certs, but curl won't check there if not instructed to. | ||
Line 84: | Line 149: | ||
On other machines, create a curl configuration file in ~/.curlrc (it doesn't seem currently possible to create a system one) with <code>capath=/etc/ssl/certs</code>. | On other machines, create a curl configuration file in ~/.curlrc (it doesn't seem currently possible to create a system one) with <code>capath=/etc/ssl/certs</code>. | ||
If you've still the issue after that, check two | If you've still the issue after that, check two things: | ||
* with <code>curl -v>, the path should correctly there: <code>* CApath: /etc/ssl/certs</code> in addition to a CAfile. | * with <code>curl -v</code>, the path should correctly there: <code>* CApath: /etc/ssl/certs</code> in addition to a CAfile. | ||
* with <code>file /etc/ssl/certs/* | grep -i nasqueron<code> you should have a result ; if not, put the certificate in /usr/local/share/certs and relaunch the tool to link it in /etc/ssl/certs, for example <code>certctl rehash</code> on FreeBSD (needs root access). | * with <code>file /etc/ssl/certs/* | grep -i nasqueron</code> you should have a result ; if not, put the certificate in /usr/local/share/certs and relaunch the tool to link it in /etc/ssl/certs, for example <code>certctl rehash</code> on FreeBSD (needs root access). | ||
=== Certificates === | |||
==== How to unseal when certificate have expired? ==== | |||
Use the web UI through SSH tunnel. | |||
==== Renew Vault HTTP certificates ==== | |||
Vault uses TLS certificates managed by its own CA. That can create a chicken/egg issue if certificates have expired to connect to it, as the Vault client strictly requires valid TLS certificated to issue commands to renew certificates. | |||
Connect to the web UI works as your browser is able to bypass certificate requirement. Opening a SSH tunnel with <code>-L 8200:complector:8200</code> should work for that. | |||
Certificates configuration is available at [https://devcentral.nasqueron.org/D2639 D2639]. | |||
The web UI doesn't offer a comprehensive sets of widgets to generate certificates. You can instead use the ''Vault Browser CLI'' to run Vault CLI commands from the web UI. The signing operation can be done through endpoint configuration UI. | |||
<div style="background-color: indianred; border: solid 3px black; color:white; font-size: large; padding: 2em;">Automation for this part is being tested. | |||
You can preview automation proposal in /home/dereckson/sandbox/vault/renew-certificates on Complector. | |||
</div> | |||
Based on current D2639 state (2023-01), you can do the following: | |||
* Has intermediate authority certificate expired? | |||
** Web console: <code>write -format=json pki_vault/intermediate/generate/internal common_name="nasqueron.drake Intermediate Authority"</code> | |||
** At https://localhost:8200/ui/vault/settings/secrets/configure/pki_root/cert you can use the sign intermediate button to sign it, don't forget to fill the following information: | |||
*** CSR, pick from console the .data.csr output from JSON response | |||
*** Format: pem_bundle | |||
*** TTL: 100 days | |||
*** Common name: "nasqueron.drake Intermediate Authority" | |||
*** Organization: Nasqueron | |||
*** Organization unit: Nasqueron Operations SIG | |||
*** Country: BE | |||
** Follow the warning instruction: copy the certificate bundle, and in pki_vault, use Set signed intermediate to set the certificate as intermediate one. That will create a new issuer, fetch the name in "issuers". | |||
** Save also the certificate bundle to /usr/local/share/certs/nasqueron-vault-intermediate.crt | |||
** Edit the nasqueron-drake role to use your new issuer name | |||
* Has complector.nasqueron.drake certificate has expired? | |||
** Web console: <code>write -format=json pki_vault/issue/nasqueron-drake common_name=complector.nasqueron.drake ttl=2160h ip_sans=127.0.0.1,172.27.27.7</code>, save the output in /usr/local/etc/certificates/vault files: | |||
*** .data.certificate to certificate.pem | |||
*** .data.issuing_ca to ca.pem | |||
*** .data.private_key to private.key (careful how you replace the \n, if you use Python REPL, do it on Complector itself and get rid of the history with <code>import readline ; readline.clear_history()</code>) | |||
** Prepare the fullchain bundle with <code>cat certificate.pem ca.pem > fullchain.pem</code> | |||
** Send a SIGHUP signal to Vault to reload configuration with <code>kill -1 <Vault PID></code>, HTTP server should pick new certificate and you can know directly use the vault command. Vault configuration currently uses fullchain.pem and private.key. | |||
CSR uses <code>line1\nline2\nline3</code> format in JSON. You can use a Python REPL with print("...") to print the CSR in several lines. CSR isn't confidential information, key is protected in Vault and never exposed. | |||
== | '''jq.''' Another tool that can help is <code>jq</code>. For example, <code>cat certificate.json | jq -r .data.certificate > /usr/local/etc/certificates/vault/certificate.pem</code> will save it in multiline format. | ||
If you get an issue about private key missing at Set signed intermediate step, don't forget you're setting an intermediate certificate for pki_vault, and not for pki_root, set that here: https://localhost:8200/ui/vault/settings/secrets/configure/pki_vault/cert | |||
Paths where to store certificates matter: | |||
* pki_root certificates are stored in /usr/local/share/certs | |||
* pki_vault certificates are stored in /usr/local/etc/certificates/vault | |||
Ensure `private.key` belongs to user `vault` and receives a correct chmod like 400. | |||
==== Services known to break when a certificate is expired ==== | |||
* Salt, but that's ad-hoc and explicit | |||
* Sentry containers: <code>sudo docker ps -a | grep sentry | grep -v "Up "</code> | |||
=== Salt === | |||
==== credentials.vault_policy_present returns Bad Request ==== | |||
Vault returns a 400 error if the policy format is incorrect. | |||
'''Case 1.''' The state ID starts by <code>vault_policy_</code>. You manually added a file to roles/vault/policies/file: this error means Vault API founds a syntax error in it. You can use https://github.com/mschuchard/linter-vault-policy if you use Pulsar (ex Atom) to lint those files. | |||
'''Case 2.''' The state ID starts by anything else. The policy has been generated through _modules/credentials.py. | |||
Check if the pillar values added are correct: | |||
* if not, add a test in <code>_tests/pillar/credentials/test_vault.py</code> to detect the case | |||
* if yes, there is an issue with the Python module code to fix | |||
'''Example of Salt output.''' | |||
<syntaxhighlight> | |||
---------- | |||
ID: vault_policy_admin | |||
Function: credentials.vault_policy_present | |||
Name: admin | |||
Result: False | |||
Comment: Failed to change policy: Bad Request | |||
Started: 13:45:14.162421 | |||
Duration: 76.168 ms | |||
Changes: | |||
</syntaxhighlight> | |||
[[Category:Operations grimoire]] | |||
[[Category:Vault]] |
Latest revision as of 20:32, 26 October 2024
This page currently documents Vault installed on complector as part of rOPS: roles/vault/. It doesn't document Docker or shellserver installation. See Operations grimoire/Eglide/Vault.
Hashicorp Vault allows to manage credentials and is currently used:
- to provision credentials through Salt
- to allow applications to fetch credentials
- to emit certificates for the private network applications
- to store credentials used by the Nasqueron Operations SIG beings
Vault policies are fully managed through Salt in the operations repository.
Vault has been selected in 2016 for secrets management.
Vault reference
kv engine
The kv engine contains the following paths:
* ops/secrets: credentials like passwords, API tokens, private keys deployed to servers - a reference for machines * ops/internal: credentials to third-party services internally shared amongst ops - a reference for humans * ops/privacy: privacy information
Secrets
Secrets are directly and manually managed in Vault. If we need to run a disaster recovery procedure, we'll roll any secret by defining them again and deploy from rOPS those new secrets.
Privacy data
Personal identity information (PII) shared inside the Nasqueron Operations squad are more convenient to manage as a repository.
Such access to the repository is restricted to commit data, and ensure this data is deployed to the servers needing it. Any other use isn't allowed.
Information to the repository is then published in Vault, and referred in rOPS as Vault credentials.
Note: The ops/privacy path is only intended for PII of current or previous members of the Ops SIG, like IP addresses allowed to connect to restricted resources like the ops VPN. This is not acceptable to put 3rd party information in this repository.
pki engine
Services listening to private IPs use the Vault PKI to generate certificates:
- pki_root: root CA, certificate available at https://api.nasqueron.org/infra/security/pki/root/ca
- pki_vault: intermediate CA used for Vault itself
Certificates should be issued short-term and frequently renewed by automated services.
transit engine
Encryption as a service is available for applications under /transit path.
See https://learn.hashicorp.com/tutorials/vault/eaas-transit.
Applications should use a policy restricting to a subpath for that app, not to the whole engine.
Users accounts
Members of the Operations SIG need an access to Vault to create and maintain credentials.
Currently, we use tokens. To generate a token:
vault token create -policy=admin
To connect to Vault with a token from a devserver (e.g. WindRiver), you need:
- Store the token in
$HOME/.vault-token
- Define environment variable VAULT_ADDR:
export VAULT_ADDR=https://172.27.27.7:8200
Tokens need to be frequently renew (before expiration) with vault renew
Salt configuration
Get a secret through Salt
The information in the ops/secrets and ops/privacy kv engines are integrated in the Salt pillar:
ext_pillar:
# Credentials to deploy to servers
- vault:
conf: path=ops/secrets
nesting_key: credentials
# Personal identity information from Nasqueron Operations SIG members
- vault:
conf: path=ops/privacy
nesting_key: privacy
You can then access to this information in Salt states using the following references:
Category | Example of path in Vault | Example of pillar key in Salt |
---|---|---|
Secrets | ops/secrets/foo | pillar['credentials']['foo'] |
Privacy data | ops/privacy/ops-cidr | pillar['privacy']['ops-cidr'] |
The ops/internal paths aren't accessible through Vault.
Policies
Policies are managed in rOPS: pillar/credentials/vault.sls.
For Salt, we deploy secrets to servers, policy management for a node consist to provide the list of secrets a role should have access to, generally in vault_secrets_by_role dictionary.
For other applications:
- create a policy .hcl in
roles/vault/policies/files/
- declare the policy in the pillar under
vault_policies
list, it should match the .hcl file (without the extension or the path) - run the repository tests to check if both information matches ; to only run that part, use
cd _tests && python3 -m unittest discover pillar/credentials/
To add only an application policy, Salt can be deployed like this:
salt complector state.sls_id vault_policy_airflow roles/vault/policies
DRP
Any step we need to do to configure the engines should be added to the rOPS: roles/vault/bootstrap unit.
If we need to recover Vault, two solutions are available:
- restore the storage, to keep data (credentials, certificates including the root CA one)
- recreate it from scratch through the roles/vault/bootstrap unit
Howto
Allow a new application to authenticate to Vault
Applications need a policy and an authentication method, for example here for an application called yourservice with AppRole with secret_id and role_id:
- Create a new policy in operations repository: D3270
- Deploy it from Complector with
salt complector state.sls_id vault_policy_yourservice roles/vault/policies
- Create an approle matching that policy:
vault write auth/approle/role/yourservice token_policies="yourservice" token_ttl=1h token_max_ttl=4h
- Get role_id with
vault read auth/approle/role/yourservice/role-id
- Get secret_id with
vault write -force auth/approle/role/yourservice/secret-id
- If you wish Salt to provision those credentials during your service deployment, probably a good idea to write them too:
vault kv put secrets/nasqueron/yourservice/vault username=role-id password=secret-id
[to skip if you don't use Salt to provision role-id and secret-id]
Get an admin-level token
Ops can use the self-service hvac script to get a token valid for 30 days:
ssh complector cd /opt/salt/nasqueron-operations git switch vault-self-service-token-policy sudo ./utils/vault/issue-admin-token.py
Once D3355 has been merged, that would be usable from the main branch.
Troubleshoot
Connecting to Vault
Access to web UI
Vault isn't listening any public IP, but accept connections from internal network, so you can open a SSH tunnel against complector:8200
and then browse https://localhost:8200:
ssh -L 8200:complector:8200 router-001.nasqueron.org
curl: (60) SSL certificate problem
Curl is compiled against a certificate bundle (curl-config --ca
for the path). That bundle doesn't contain the pki_root certificate. We deploy this certificate in /etc/ssl/certs, but curl won't check there if not instructed to.
If you build an immutable environment like a container, add the certificate to the bundle.
On other machines, create a curl configuration file in ~/.curlrc (it doesn't seem currently possible to create a system one) with capath=/etc/ssl/certs
.
If you've still the issue after that, check two things:
- with
curl -v
, the path should correctly there:* CApath: /etc/ssl/certs
in addition to a CAfile. - with
file /etc/ssl/certs/* | grep -i nasqueron
you should have a result ; if not, put the certificate in /usr/local/share/certs and relaunch the tool to link it in /etc/ssl/certs, for examplecertctl rehash
on FreeBSD (needs root access).
Certificates
How to unseal when certificate have expired?
Use the web UI through SSH tunnel.
Renew Vault HTTP certificates
Vault uses TLS certificates managed by its own CA. That can create a chicken/egg issue if certificates have expired to connect to it, as the Vault client strictly requires valid TLS certificated to issue commands to renew certificates.
Connect to the web UI works as your browser is able to bypass certificate requirement. Opening a SSH tunnel with -L 8200:complector:8200
should work for that.
Certificates configuration is available at D2639.
The web UI doesn't offer a comprehensive sets of widgets to generate certificates. You can instead use the Vault Browser CLI to run Vault CLI commands from the web UI. The signing operation can be done through endpoint configuration UI.
You can preview automation proposal in /home/dereckson/sandbox/vault/renew-certificates on Complector.
Based on current D2639 state (2023-01), you can do the following:
- Has intermediate authority certificate expired?
- Web console:
write -format=json pki_vault/intermediate/generate/internal common_name="nasqueron.drake Intermediate Authority"
- At https://localhost:8200/ui/vault/settings/secrets/configure/pki_root/cert you can use the sign intermediate button to sign it, don't forget to fill the following information:
- CSR, pick from console the .data.csr output from JSON response
- Format: pem_bundle
- TTL: 100 days
- Common name: "nasqueron.drake Intermediate Authority"
- Organization: Nasqueron
- Organization unit: Nasqueron Operations SIG
- Country: BE
- Follow the warning instruction: copy the certificate bundle, and in pki_vault, use Set signed intermediate to set the certificate as intermediate one. That will create a new issuer, fetch the name in "issuers".
- Save also the certificate bundle to /usr/local/share/certs/nasqueron-vault-intermediate.crt
- Edit the nasqueron-drake role to use your new issuer name
- Web console:
- Has complector.nasqueron.drake certificate has expired?
- Web console:
write -format=json pki_vault/issue/nasqueron-drake common_name=complector.nasqueron.drake ttl=2160h ip_sans=127.0.0.1,172.27.27.7
, save the output in /usr/local/etc/certificates/vault files:- .data.certificate to certificate.pem
- .data.issuing_ca to ca.pem
- .data.private_key to private.key (careful how you replace the \n, if you use Python REPL, do it on Complector itself and get rid of the history with
import readline ; readline.clear_history()
)
- Prepare the fullchain bundle with
cat certificate.pem ca.pem > fullchain.pem
- Send a SIGHUP signal to Vault to reload configuration with
kill -1 <Vault PID>
, HTTP server should pick new certificate and you can know directly use the vault command. Vault configuration currently uses fullchain.pem and private.key.
- Web console:
CSR uses line1\nline2\nline3
format in JSON. You can use a Python REPL with print("...") to print the CSR in several lines. CSR isn't confidential information, key is protected in Vault and never exposed.
jq. Another tool that can help is jq
. For example, cat certificate.json | jq -r .data.certificate > /usr/local/etc/certificates/vault/certificate.pem
will save it in multiline format.
If you get an issue about private key missing at Set signed intermediate step, don't forget you're setting an intermediate certificate for pki_vault, and not for pki_root, set that here: https://localhost:8200/ui/vault/settings/secrets/configure/pki_vault/cert
Paths where to store certificates matter:
- pki_root certificates are stored in /usr/local/share/certs
- pki_vault certificates are stored in /usr/local/etc/certificates/vault
Ensure `private.key` belongs to user `vault` and receives a correct chmod like 400.
Services known to break when a certificate is expired
- Salt, but that's ad-hoc and explicit
- Sentry containers:
sudo docker ps -a | grep sentry | grep -v "Up "
Salt
credentials.vault_policy_present returns Bad Request
Vault returns a 400 error if the policy format is incorrect.
Case 1. The state ID starts by vault_policy_
. You manually added a file to roles/vault/policies/file: this error means Vault API founds a syntax error in it. You can use https://github.com/mschuchard/linter-vault-policy if you use Pulsar (ex Atom) to lint those files.
Case 2. The state ID starts by anything else. The policy has been generated through _modules/credentials.py.
Check if the pillar values added are correct:
- if not, add a test in
_tests/pillar/credentials/test_vault.py
to detect the case - if yes, there is an issue with the Python module code to fix
Example of Salt output.
----------
ID: vault_policy_admin
Function: credentials.vault_policy_present
Name: admin
Result: False
Comment: Failed to change policy: Bad Request
Started: 13:45:14.162421
Duration: 76.168 ms
Changes: