Operations grimoire/Sentry

From Nasqueron Agora

📕📁📜 Old technical information :: content warning

⌛ This Nasqueron Operations Grimoire page hasn't been updated for a long time.

☣ As our infrastructure evolves quickly, there is a good chance this information is outdated or now inaccurate. Be careful and consider update it.

➡️ To assert the information is still up-to-date or not, you can check the history of the relevant role in our Operations repository.

Sentry is deployed on docker-002 through the Docker PaaS.

The Sentry DSN or the humans UI can be reached at https://sentry.nasqueron.org

Command line

Sentry provides a command line interface.

To use it, log in to docker-002, then run sentry nasqueron <command>. It will then spawn a new container correctly linked to the database, SMTP server and PostgreSQL database to run it. TODO: update the sentry wrapper to use the sentry network and get the right environment instead.

If you've deployed another instance than for Nasqueron, replace it by the realm name.

The following commands are available in 23.3:

 cleanup       Delete a portion of trailing data based on creation date.
 config        Manage runtime config options.
 createuser    Create a new user.
 devserver     Starts a lightweight web server for development.
 devservices   Manage dependent development services required for Sentry.
 django        Execute Django subcommands.
 exec          Execute a script.
 execfile      Execute a script.
 export        Exports core metadata for the Sentry installation.
 files         Manage files from filestore.
 help          Show this message and exit.
 import        Imports data from a Sentry export.
 init          Initialize new configuration directory.
 killswitches  Manage killswitches for ingestion pipeline.
 migrations    Manage migrations.
 performance   Performance utilities
 permissions   Manage Permissions for Users.
 plugins       Manage Sentry plugins.
 queues        Manage Sentry queues.
 repair        Attempt to repair any invalid data.
 run           Run a service.
 sendmail      Sends emails from the default notification mail address.
 shell         Run a Python interactive interpreter.
 spans         Span utilities
 start         DEPRECATED see `sentry run` instead.
 tsdb          Tools for interacting with the time series database.
 upgrade       Perform any pending database migrations and upgrades.
 write-hashes  Runs span hash grouping on event data in the supplied filename using the...

Kafka and ZooKeeper

  • Deployment pitfall: ZooKeeper wants a clean log if you initialise it, if not it will shutdown. Ensure it has always a data volume or an empty log when starting it.
  • Kafka has a 20 seconds pause before initializing the topics, to let the cluster correctly start.

To reprovision Kafka and ZooKeeper from scratch, the following works fine:

$ rm -rf /srv/kafka/sentry_kafka /srv/zookeeper/sentry_zookeeper
$ deploy-container zookeeper
$ deploy-container kafka

How to upgrade Sentry?

On Docker Hub or through Dwellers and our local registry, build a new nasqueron/sentry image.

On the Docker engine, perform database migration first, then upgrade the container:

  1. Pull the new image: docker pull nasqueron/sentry
  2. Run migrations: sentry nasqueron upgrade (should we stop Sentry containers first?)
  3. Deploy containers: deploy-container sentry

Deployment of Sentry containers

Sentry containers are currently deployed to docker-002. Pillar configuration and containers definitions are based on the getsentry/self-hosted repository.

The containers ARE NOT directly deployed USING the getsentry/self-hosted repository or the scripts it provides. Containers are regularly managed like any other container through Salt in paas-docker role. Instead, the getsentry/self-hosted repository is used as reference implementation documentation and any change needs to be reflected to our own configuration.

Correspondance repositories
rOPS getsentry/self-hosted Sentry version
D2907 4d318761a84c  23.4.0.dev
D2881 06662c6cdc85  23.2.0

Non-features

The following features can't currently be deployed:

Troubleshoot

Resources

A troubleshoot page exists in Sentry dev documentation, we followed successfully the out-of-sync Kafka procedure for example: https://develop.sentry.dev/self-hosted/troubleshooting/

Relay

Relay Rust DNS resolution library has a strange issue:

2023-03-27T14:29:18Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: error sending request for url (http://sentry_web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })

As a workaround, put the sentry_web IP (docker inspect sentry_web | grep 172 to get it) in /srv/relay/sentry_relay/config.yml file under relay.upstream key.

If you've this message in the log, the sentry_web container is down or more probably its IP has changed and you also need to update it:

2023-03-27T14:14:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: error sending request for url (http://172.18.3.19:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
  caused by: error trying to connect: tcp connect error: Connection refused (os error 111)
  caused by: tcp connect error: Connection refused (os error 111)
  caused by: Connection refused (os error 111)

Note DNS resolution works, even in Rust code, enjoy that one log entry:

2023-03-27T14:36:43Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:sentry_kafka:9092/bootstrap]: sentry_kafka:9092/1001: Connect to ipv4#172.18.3.8:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)

Kafka

After a memory issue on docker-002 and a hard reboot, the sentry_post_process_forwarder_transactions and sentry_post_process_forwarder_errors containers exited on start with the following log message:

arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"}

The out of sync procedure in Sentry troubleshooting doc works fine to solve that.

As we don't have a kafka wrapper, you can run commands directly on a Kafka instance, or spawn a new troubleshoot container on the same network. Example of the first solution:

$ docker exec -it sentry_kafka bash
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe

Consumer group 'snuba-post-processor' has no active members.

GROUP                TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
snuba-post-processor events          0          43              44              1               -               -               -
snuba-post-processor transactions    0          467             476             9               -               -               -

$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic events --reset-offsets --to-latest --execute

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-post-processor           events                         0          44

$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic transactions --reset-offsets --to-latest --execute

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-post-processor           transactions                   0          476

$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe

Consumer group 'snuba-post-processor' has no active members.

GROUP                TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
snuba-post-processor events          0          44              44              0               -               -               -
snuba-post-processor transactions    0          476             476             0               -               -               -

In the second case, you'll need to replace --bootstrap-server localhost:9092 by the correct way to reach a Kafka server, for example --bootstrap-server sentry_kafka:9092. If it doesn't work, ensure you're on the same network.

If you write a kafka wrapper syntax could be kafka [network] <kafka server> <command> perhaps?

Also, if this error occurs often, we could automate it, discussion for that at T1816

Vault

Containers based on Sentry needs access to Vault to operate. That means 3 things:

  1. Private network works, ie Complector can be reached from Sentry host machine to 172.27.27.7
  2. Vault is up and unsealed
  3. Certificate is valid

For example, if the Vault certificate is expired, you can have the following failure in the log:

$ docker logs sentry_web
!! Configuration error: ConfigurationError("SSLError: HTTPSConnectionPool(host='172.27.27.7', port=8200): Max retries exceeded with url: /v1/auth/approle/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))")