Operations grimoire/Sentry: Difference between revisions
(Upgrade to 23.4.0.dev) |
m (→Relay) |
||
(8 intermediate revisions by the same user not shown) | |||
Line 56: | Line 56: | ||
$ deploy-container kafka | $ deploy-container kafka | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== How to upgrade Sentry? == | |||
On Docker Hub or through Dwellers and our local registry, build a new nasqueron/sentry image. | |||
On the Docker engine, perform database migration first, then upgrade the container: | |||
# Pull the new image: <code>docker pull nasqueron/sentry</code> | |||
# Run migrations: <code>sentry nasqueron upgrade</code> (should we stop Sentry containers first?) | |||
# Deploy containers: <code>deploy-container sentry</code> | |||
== Deployment of Sentry containers == | == Deployment of Sentry containers == | ||
Line 68: | Line 77: | ||
! rOPS !! getsentry/self-hosted !! Sentry version | ! rOPS !! getsentry/self-hosted !! Sentry version | ||
|- | |- | ||
| [https://devcentral.nasqueron.org/ | | [https://devcentral.nasqueron.org/D2907 D2907] || [https://github.com/getsentry/self-hosted/commit/4d318761a84cae531f3d13cbb4420e640fa660f2 4d318761a84c] || 23.4.0.dev | ||
|- | |- | ||
| [https://devcentral.nasqueron.org/D2881 D2881] || [https://github.com/getsentry/self-hosted/commit/06662c6cdc8573f81b6a128c7bdb815efd5040ed 06662c6cdc85] || 23.2.0 | | [https://devcentral.nasqueron.org/D2881 D2881] || [https://github.com/getsentry/self-hosted/commit/06662c6cdc8573f81b6a128c7bdb815efd5040ed 06662c6cdc85] || 23.2.0 | ||
|} | |} | ||
== Non-features == | |||
The following features can't currently be deployed: | |||
* https://github.com/getsentry/self-hosted/pull/2115/files - https://docs.sentry.io/product/issues/issue-details/ai-suggested-solution/ - lack of relevance of the solutions offered, privacy policy difficult to align, can't justify extra cost for the API subscription | |||
== Troubleshoot == | |||
=== Resources === | |||
A troubleshoot page exists in Sentry dev documentation, we followed successfully the out-of-sync Kafka procedure for example: | |||
https://develop.sentry.dev/self-hosted/troubleshooting/ | |||
=== Relay === | |||
Relay Rust DNS resolution library has a strange issue: | |||
<syntaxhighlight lang="rust"> | |||
2023-03-27T14:29:18Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream | |||
caused by: error sending request for url (http://sentry_web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules }) | |||
caused by: error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules }) | |||
caused by: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules }) | |||
caused by: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules }) | |||
</syntaxhighlight> | |||
As a workaround, put the sentry_web IP (<code>docker inspect sentry_web | grep 172</code> to get it) in <code>/srv/relay/sentry_relay/config.yml</code> file under <code>relay.upstream</code> key. | |||
If you've this message in the log, the sentry_web container is down or more probably its IP has changed and you also need to update it: | |||
<syntaxhighlight lang="rust"> | |||
2023-03-27T14:14:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream | |||
caused by: error sending request for url (http://172.18.3.19:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111) | |||
caused by: error trying to connect: tcp connect error: Connection refused (os error 111) | |||
caused by: tcp connect error: Connection refused (os error 111) | |||
caused by: Connection refused (os error 111) | |||
</syntaxhighlight> | |||
Note DNS resolution works, even in Rust code, enjoy that one log entry: | |||
<syntaxhighlight lang="rust"> | |||
2023-03-27T14:36:43Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:sentry_kafka:9092/bootstrap]: sentry_kafka:9092/1001: Connect to ipv4#172.18.3.8:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed) | |||
</syntaxhighlight> | |||
=== Kafka === | |||
After a memory issue on docker-002 and a hard reboot, the sentry_post_process_forwarder_transactions and sentry_post_process_forwarder_errors containers exited on start with the following log message: | |||
arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"} | |||
The out of sync procedure in [https://develop.sentry.dev/self-hosted/troubleshooting/ Sentry troubleshooting doc] works fine to solve that. | |||
As we don't have a kafka wrapper, you can run commands directly on a Kafka instance, or spawn a new troubleshoot container on the same network. Example of the first solution: | |||
<syntaxhighlight lang="console"> | |||
$ docker exec -it sentry_kafka bash | |||
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe | |||
Consumer group 'snuba-post-processor' has no active members. | |||
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID | |||
snuba-post-processor events 0 43 44 1 - - - | |||
snuba-post-processor transactions 0 467 476 9 - - - | |||
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic events --reset-offsets --to-latest --execute | |||
GROUP TOPIC PARTITION NEW-OFFSET | |||
snuba-post-processor events 0 44 | |||
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic transactions --reset-offsets --to-latest --execute | |||
GROUP TOPIC PARTITION NEW-OFFSET | |||
snuba-post-processor transactions 0 476 | |||
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe | |||
Consumer group 'snuba-post-processor' has no active members. | |||
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID | |||
snuba-post-processor events 0 44 44 0 - - - | |||
snuba-post-processor transactions 0 476 476 0 - - - | |||
</syntaxhighlight> | |||
In the second case, you'll need to replace <code>--bootstrap-server localhost:9092</code> by the correct way to reach a Kafka server, for example <code>--bootstrap-server sentry_kafka:9092</code>. If it doesn't work, ensure you're on the same network. | |||
If you write a kafka wrapper syntax could be <code>kafka [network] <kafka server> <command></code> perhaps? | |||
Also, if this error occurs often, we could automate it, discussion for that at [https://devcentral.nasqueron.org/T1816 T1816] | |||
=== Vault === | |||
Containers based on Sentry needs access to Vault to operate. That means 3 things: | |||
# Private network works, ie Complector can be reached from Sentry host machine to 172.27.27.7 | |||
# Vault is up and unsealed | |||
# Certificate is valid | |||
For example, if the Vault certificate is expired, you can have the following failure in the log: | |||
<syntaxhighlight lang="console"> | |||
$ docker logs sentry_web | |||
!! Configuration error: ConfigurationError("SSLError: HTTPSConnectionPool(host='172.27.27.7', port=8200): Max retries exceeded with url: /v1/auth/approle/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))") | |||
</syntaxhighlight> |
Latest revision as of 17:14, 13 January 2024
Sentry is deployed on docker-002 through the Docker PaaS.
The Sentry DSN or the humans UI can be reached at https://sentry.nasqueron.org
Command line
Sentry provides a command line interface.
To use it, log in to docker-002, then run sentry nasqueron <command>
. It will then spawn a new container correctly linked to the database, SMTP server and PostgreSQL database to run it.
TODO: update the sentry wrapper to use the sentry network and get the right environment instead.
If you've deployed another instance than for Nasqueron, replace it by the realm name.
The following commands are available in 23.3:
cleanup Delete a portion of trailing data based on creation date. config Manage runtime config options. createuser Create a new user. devserver Starts a lightweight web server for development. devservices Manage dependent development services required for Sentry. django Execute Django subcommands. exec Execute a script. execfile Execute a script. export Exports core metadata for the Sentry installation. files Manage files from filestore. help Show this message and exit. import Imports data from a Sentry export. init Initialize new configuration directory. killswitches Manage killswitches for ingestion pipeline. migrations Manage migrations. performance Performance utilities permissions Manage Permissions for Users. plugins Manage Sentry plugins. queues Manage Sentry queues. repair Attempt to repair any invalid data. run Run a service. sendmail Sends emails from the default notification mail address. shell Run a Python interactive interpreter. spans Span utilities start DEPRECATED see `sentry run` instead. tsdb Tools for interacting with the time series database. upgrade Perform any pending database migrations and upgrades. write-hashes Runs span hash grouping on event data in the supplied filename using the...
Kafka and ZooKeeper
- Deployment pitfall: ZooKeeper wants a clean log if you initialise it, if not it will shutdown. Ensure it has always a data volume or an empty log when starting it.
- Kafka has a 20 seconds pause before initializing the topics, to let the cluster correctly start.
To reprovision Kafka and ZooKeeper from scratch, the following works fine:
$ rm -rf /srv/kafka/sentry_kafka /srv/zookeeper/sentry_zookeeper
$ deploy-container zookeeper
$ deploy-container kafka
How to upgrade Sentry?
On Docker Hub or through Dwellers and our local registry, build a new nasqueron/sentry image.
On the Docker engine, perform database migration first, then upgrade the container:
- Pull the new image:
docker pull nasqueron/sentry
- Run migrations:
sentry nasqueron upgrade
(should we stop Sentry containers first?) - Deploy containers:
deploy-container sentry
Deployment of Sentry containers
Sentry containers are currently deployed to docker-002. Pillar configuration and containers definitions are based on the getsentry/self-hosted repository.
The containers ARE NOT directly deployed USING the getsentry/self-hosted repository or the scripts it provides. Containers are regularly managed like any other container through Salt in paas-docker role. Instead, the getsentry/self-hosted repository is used as reference implementation documentation and any change needs to be reflected to our own configuration.
rOPS | getsentry/self-hosted | Sentry version |
---|---|---|
D2907 | 4d318761a84c | 23.4.0.dev |
D2881 | 06662c6cdc85 | 23.2.0 |
Non-features
The following features can't currently be deployed:
- https://github.com/getsentry/self-hosted/pull/2115/files - https://docs.sentry.io/product/issues/issue-details/ai-suggested-solution/ - lack of relevance of the solutions offered, privacy policy difficult to align, can't justify extra cost for the API subscription
Troubleshoot
Resources
A troubleshoot page exists in Sentry dev documentation, we followed successfully the out-of-sync Kafka procedure for example: https://develop.sentry.dev/self-hosted/troubleshooting/
Relay
Relay Rust DNS resolution library has a strange issue:
2023-03-27T14:29:18Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: error sending request for url (http://sentry_web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
caused by: error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
caused by: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
caused by: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
As a workaround, put the sentry_web IP (docker inspect sentry_web | grep 172
to get it) in /srv/relay/sentry_relay/config.yml
file under relay.upstream
key.
If you've this message in the log, the sentry_web container is down or more probably its IP has changed and you also need to update it:
2023-03-27T14:14:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: error sending request for url (http://172.18.3.19:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
caused by: error trying to connect: tcp connect error: Connection refused (os error 111)
caused by: tcp connect error: Connection refused (os error 111)
caused by: Connection refused (os error 111)
Note DNS resolution works, even in Rust code, enjoy that one log entry:
2023-03-27T14:36:43Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:sentry_kafka:9092/bootstrap]: sentry_kafka:9092/1001: Connect to ipv4#172.18.3.8:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
Kafka
After a memory issue on docker-002 and a hard reboot, the sentry_post_process_forwarder_transactions and sentry_post_process_forwarder_errors containers exited on start with the following log message:
arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"}
The out of sync procedure in Sentry troubleshooting doc works fine to solve that.
As we don't have a kafka wrapper, you can run commands directly on a Kafka instance, or spawn a new troubleshoot container on the same network. Example of the first solution:
$ docker exec -it sentry_kafka bash
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe
Consumer group 'snuba-post-processor' has no active members.
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
snuba-post-processor events 0 43 44 1 - - -
snuba-post-processor transactions 0 467 476 9 - - -
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic events --reset-offsets --to-latest --execute
GROUP TOPIC PARTITION NEW-OFFSET
snuba-post-processor events 0 44
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic transactions --reset-offsets --to-latest --execute
GROUP TOPIC PARTITION NEW-OFFSET
snuba-post-processor transactions 0 476
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe
Consumer group 'snuba-post-processor' has no active members.
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
snuba-post-processor events 0 44 44 0 - - -
snuba-post-processor transactions 0 476 476 0 - - -
In the second case, you'll need to replace --bootstrap-server localhost:9092
by the correct way to reach a Kafka server, for example --bootstrap-server sentry_kafka:9092
. If it doesn't work, ensure you're on the same network.
If you write a kafka wrapper syntax could be kafka [network] <kafka server> <command>
perhaps?
Also, if this error occurs often, we could automate it, discussion for that at T1816
Vault
Containers based on Sentry needs access to Vault to operate. That means 3 things:
- Private network works, ie Complector can be reached from Sentry host machine to 172.27.27.7
- Vault is up and unsealed
- Certificate is valid
For example, if the Vault certificate is expired, you can have the following failure in the log:
$ docker logs sentry_web
!! Configuration error: ConfigurationError("SSLError: HTTPSConnectionPool(host='172.27.27.7', port=8200): Max retries exceeded with url: /v1/auth/approle/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))")