Operations grimoire/Sentry: Difference between revisions
No edit summary |
(→Kafka) |
||
Line 88: | Line 88: | ||
A trouble shoot page exists in Sentry dev documentation, we followed successfully the out-of-sync Kafka procedure for example: | A trouble shoot page exists in Sentry dev documentation, we followed successfully the out-of-sync Kafka procedure for example: | ||
https://develop.sentry.dev/self-hosted/troubleshooting/ | https://develop.sentry.dev/self-hosted/troubleshooting/ | ||
=== Relay === | |||
Relay Rust DNS resolution library has a strange issue: | |||
<syntaxhighlight lang="rust"> | |||
2023-03-27T14:29:18Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream | |||
caused by: error sending request for url (http://sentry_web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules }) | |||
caused by: error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules }) | |||
caused by: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules }) | |||
caused by: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules }) | |||
</syntaxhighlight> | |||
As a workaround, put the sentry_web IP (<code>docker inspect sentry_web | grep 172</code> to get it) in <syntaxhighlight>/srv/relay/sentry_relay/config.yml</code> file under <code>relay.upstream</code> key. | |||
If you've this message in the log, the sentry_web container is down or more probably its IP has changed and you also need to update it: | |||
<syntaxhighlight lang="rust"> | |||
2023-03-27T14:14:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream | |||
caused by: error sending request for url (http://172.18.3.19:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111) | |||
caused by: error trying to connect: tcp connect error: Connection refused (os error 111) | |||
caused by: tcp connect error: Connection refused (os error 111) | |||
caused by: Connection refused (os error 111) | |||
</syntaxhighlight> | |||
Note DNS resolution works, even in Rust code, enjoy that one log entry: | |||
<syntaxhighlight lang="rust"> | |||
2023-03-27T14:36:43Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:sentry_kafka:9092/bootstrap]: sentry_kafka:9092/1001: Connect to ipv4#172.18.3.8:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed) | |||
</syntaxhighlight> | |||
=== Kafka === | === Kafka === |
Revision as of 16:06, 27 March 2023
Sentry is deployed on docker-002 through the Docker PaaS.
The Sentry DSN or the humans UI can be reached at https://sentry.nasqueron.org
Command line
Sentry provides a command line interface.
To use it, log in to docker-002, then run sentry nasqueron <command>
. It will then spawn a new container correctly linked to the database, SMTP server and PostgreSQL database to run it.
TODO: update the sentry wrapper to use the sentry network and get the right environment instead.
If you've deployed another instance than for Nasqueron, replace it by the realm name.
The following commands are available in 23.3:
cleanup Delete a portion of trailing data based on creation date. config Manage runtime config options. createuser Create a new user. devserver Starts a lightweight web server for development. devservices Manage dependent development services required for Sentry. django Execute Django subcommands. exec Execute a script. execfile Execute a script. export Exports core metadata for the Sentry installation. files Manage files from filestore. help Show this message and exit. import Imports data from a Sentry export. init Initialize new configuration directory. killswitches Manage killswitches for ingestion pipeline. migrations Manage migrations. performance Performance utilities permissions Manage Permissions for Users. plugins Manage Sentry plugins. queues Manage Sentry queues. repair Attempt to repair any invalid data. run Run a service. sendmail Sends emails from the default notification mail address. shell Run a Python interactive interpreter. spans Span utilities start DEPRECATED see `sentry run` instead. tsdb Tools for interacting with the time series database. upgrade Perform any pending database migrations and upgrades. write-hashes Runs span hash grouping on event data in the supplied filename using the...
Kafka and ZooKeeper
- Deployment pitfall: ZooKeeper wants a clean log if you initialise it, if not it will shutdown. Ensure it has always a data volume or an empty log when starting it.
- Kafka has a 20 seconds pause before initializing the topics, to let the cluster correctly start.
To reprovision Kafka and ZooKeeper from scratch, the following works fine:
$ rm -rf /srv/kafka/sentry_kafka /srv/zookeeper/sentry_zookeeper
$ deploy-container zookeeper
$ deploy-container kafka
How to upgrade Sentry?
On Docker Hub or through Dwellers and our local registry, build a new nasqueron/sentry image.
On the Docker engine, perform database migration first, then upgrade the container:
- Pull the new image:
docker pull nasqueron/sentry
- Run migrations:
sentry nasqueron upgrade
(should we stop Sentry containers first?) - Deploy containers:
deploy-container sentry
Deployment of Sentry containers
Sentry containers are currently deployed to docker-002. Pillar configuration and containers definitions are based on the getsentry/self-hosted repository.
The containers ARE NOT directly deployed USING the getsentry/self-hosted repository or the scripts it provides. Containers are regularly managed like any other container through Salt in paas-docker role. Instead, the getsentry/self-hosted repository is used as reference implementation documentation and any change needs to be reflected to our own configuration.
rOPS | getsentry/self-hosted | Sentry version |
---|---|---|
D2907 | 4d318761a84c | 23.4.0.dev |
D2881 | 06662c6cdc85 | 23.2.0 |
Troubleshoot
Resources
A trouble shoot page exists in Sentry dev documentation, we followed successfully the out-of-sync Kafka procedure for example: https://develop.sentry.dev/self-hosted/troubleshooting/
Relay
Relay Rust DNS resolution library has a strange issue:
2023-03-27T14:29:18Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: error sending request for url (http://sentry_web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
caused by: error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
caused by: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
caused by: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
As a workaround, put the sentry_web IP (docker inspect sentry_web | grep 172
to get it) in
/srv/relay/sentry_relay/config.yml</code> file under <code>relay.upstream</code> key.
If you've this message in the log, the sentry_web container is down or more probably its IP has changed and you also need to update it:
<syntaxhighlight lang="rust">
2023-03-27T14:14:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: error sending request for url (http://172.18.3.19:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
caused by: error trying to connect: tcp connect error: Connection refused (os error 111)
caused by: tcp connect error: Connection refused (os error 111)
caused by: Connection refused (os error 111)
Note DNS resolution works, even in Rust code, enjoy that one log entry:
2023-03-27T14:36:43Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:sentry_kafka:9092/bootstrap]: sentry_kafka:9092/1001: Connect to ipv4#172.18.3.8:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
Kafka
After a memory issue on docker-002 and a hard reboot, the entry_post_process_forwarder_transactions and sentry_post_process_forwarder_errors containers exited on start with the following log message:
arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"}
The out of sync procedure in Sentry troubleshooting doc works fine to solve that.
As we don't have a kafka wrapper, you can run commands directly on a Kafka instance, or spawn a new troubleshoot container on the same network. Example of the first solution:
$ docker exec -it sentry_kafka bash
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe
Consumer group 'snuba-post-processor' has no active members.
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
snuba-post-processor events 0 43 44 1 - - -
snuba-post-processor transactions 0 467 476 9 - - -
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic events --reset-offsets --to-latest --execute
GROUP TOPIC PARTITION NEW-OFFSET
snuba-post-processor events 0 44
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic transactions --reset-offsets --to-latest --execute
GROUP TOPIC PARTITION NEW-OFFSET
snuba-post-processor transactions 0 476
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe
Consumer group 'snuba-post-processor' has no active members.
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
snuba-post-processor events 0 44 44 0 - - -
snuba-post-processor transactions 0 476 476 0 - - -
In the second case, you'll need to replace --bootstrap-server localhost:9092
by the correct way to reach a Kafka server, for example --bootstrap-server sentry_kafka:9092
. If it doesn't work, ensure you're on the same network.
If you write a kafka wrapper syntax could be kafka [network] <kafka server> <command>
perhaps?