Operations grimoire/Sentry: Difference between revisions

From Nasqueron Agora
No edit summary
 
(10 intermediate revisions by the same user not shown)
Line 12: Line 12:
If you've deployed another instance than for Nasqueron, replace it by the realm name.
If you've deployed another instance than for Nasqueron, replace it by the realm name.


The following commands are available in 9.0:
The following commands are available in 23.3:


   cleanup       Delete a portion of trailing data based on...
   cleanup       Delete a portion of trailing data based on creation date.
  cleanup_chunk
   config       Manage runtime config options.
   config         Manage runtime config options.
   createuser   Create a new user.
   createuser     Create a new user.
   devserver     Starts a lightweight web server for development.
   devserver     Starts a lightweight web server for...
  devservices  Manage dependent development services required for Sentry.
   django         Execute Django subcommands.
   django       Execute Django subcommands.
   exec           Execute a script.
   exec         Execute a script.
   export         Exports core metadata for the Sentry...
  execfile      Execute a script.
   files         Manage files from filestore.
   export       Exports core metadata for the Sentry installation.
   help           Show this message and exit.
   files         Manage files from filestore.
   import         Imports data from a Sentry export.
   help         Show this message and exit.
   init           Initialize new configuration directory.
   import       Imports data from a Sentry export.
   permissions   Manage Permissions for Users.
   init         Initialize new configuration directory.
   plugins       Manage Sentry plugins.
  killswitches  Manage killswitches for ingestion pipeline.
   queues         Manage Sentry queues.
  migrations    Manage migrations.
   repair         Attempt to repair any invalid data.
  performance  Performance utilities
   run           Run a service.
   permissions   Manage Permissions for Users.
   shell         Run a Python interactive interpreter.
   plugins       Manage Sentry plugins.
   start         DEPRECATED see `sentry run` instead.
   queues       Manage Sentry queues.
   tsdb           Tools for interacting with the time series...
   repair       Attempt to repair any invalid data.
   upgrade       Perform any pending database migrations and...
   run           Run a service.
  sendmail      Sends emails from the default notification mail address.
   shell         Run a Python interactive interpreter.
  spans        Span utilities
   start         DEPRECATED see `sentry run` instead.
   tsdb         Tools for interacting with the time series database.
   upgrade       Perform any pending database migrations and upgrades.
  write-hashes  Runs span hash grouping on event data in the supplied filename using the...


== Kafka and ZooKeeper ==
== Kafka and ZooKeeper ==
Line 49: Line 56:
$ deploy-container kafka
$ deploy-container kafka
</syntaxhighlight>
</syntaxhighlight>
== How to upgrade Sentry? ==
On Docker Hub or through Dwellers and our local registry, build a new nasqueron/sentry image.
On the Docker engine, perform database migration first, then upgrade the container:
# Pull the new image: <code>docker pull nasqueron/sentry</code>
# Run migrations: <code>sentry nasqueron upgrade</code> (should we stop Sentry containers first?)
# Deploy containers: <code>deploy-container sentry</code>


== Deployment of Sentry containers ==
== Deployment of Sentry containers ==
Line 61: Line 77:
! rOPS !! getsentry/self-hosted !! Sentry version
! rOPS !! getsentry/self-hosted !! Sentry version
|-
|-
| [https://devcentral.nasqueron.org/D2808 D2881] || [https://github.com/getsentry/self-hosted/commit/06662c6cdc8573f81b6a128c7bdb815efd5040ed 06662c6cdc85] || 23.2.0
| [https://devcentral.nasqueron.org/D2907 D2907] || [https://github.com/getsentry/self-hosted/commit/4d318761a84cae531f3d13cbb4420e640fa660f2 4d318761a84c] || 23.4.0.dev
|-
| [https://devcentral.nasqueron.org/D2881 D2881] || [https://github.com/getsentry/self-hosted/commit/06662c6cdc8573f81b6a128c7bdb815efd5040ed 06662c6cdc85] || 23.2.0
|}
|}
== Non-features ==
The following features can't currently be deployed:
* https://github.com/getsentry/self-hosted/pull/2115/files - https://docs.sentry.io/product/issues/issue-details/ai-suggested-solution/ - lack of relevance of the solutions offered, privacy policy difficult to align, can't justify extra cost for the API subscription
== Troubleshoot ==
=== Resources ===
A troubleshoot page exists in Sentry dev documentation, we followed successfully the out-of-sync Kafka procedure for example:
https://develop.sentry.dev/self-hosted/troubleshooting/
=== Relay ===
Relay Rust DNS resolution library has a strange issue:
<syntaxhighlight lang="rust">
2023-03-27T14:29:18Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: error sending request for url (http://sentry_web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
</syntaxhighlight>
As a workaround, put the sentry_web IP (<code>docker inspect sentry_web | grep 172</code> to get it) in <code>/srv/relay/sentry_relay/config.yml</code> file under <code>relay.upstream</code> key.
If you've this message in the log, the sentry_web container is down or more probably its IP has changed and you also need to update it:
<syntaxhighlight lang="rust">
2023-03-27T14:14:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: error sending request for url (http://172.18.3.19:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
  caused by: error trying to connect: tcp connect error: Connection refused (os error 111)
  caused by: tcp connect error: Connection refused (os error 111)
  caused by: Connection refused (os error 111)
</syntaxhighlight>
Note DNS resolution works, even in Rust code, enjoy that one log entry:
<syntaxhighlight lang="rust">
2023-03-27T14:36:43Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:sentry_kafka:9092/bootstrap]: sentry_kafka:9092/1001: Connect to ipv4#172.18.3.8:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)
</syntaxhighlight>
=== Kafka ===
After a memory issue on docker-002 and a hard reboot, the sentry_post_process_forwarder_transactions and sentry_post_process_forwarder_errors containers exited on start with the following log message:
arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"}
The out of sync procedure in [https://develop.sentry.dev/self-hosted/troubleshooting/ Sentry troubleshooting doc] works fine to solve that.
As we don't have a kafka wrapper, you can run commands directly on a Kafka instance, or spawn a new troubleshoot container on the same network. Example of the first solution:
<syntaxhighlight lang="console">
$ docker exec -it sentry_kafka bash
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe
Consumer group 'snuba-post-processor' has no active members.
GROUP                TOPIC          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG            CONSUMER-ID    HOST            CLIENT-ID
snuba-post-processor events          0          43              44              1              -              -              -
snuba-post-processor transactions    0          467            476            9              -              -              -
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic events --reset-offsets --to-latest --execute
GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-post-processor          events                        0          44
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic transactions --reset-offsets --to-latest --execute
GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-post-processor          transactions                  0          476
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe
Consumer group 'snuba-post-processor' has no active members.
GROUP                TOPIC          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG            CONSUMER-ID    HOST            CLIENT-ID
snuba-post-processor events          0          44              44              0              -              -              -
snuba-post-processor transactions    0          476            476            0              -              -              -
</syntaxhighlight>
In the second case, you'll need to replace <code>--bootstrap-server localhost:9092</code> by the correct way to reach a Kafka server, for example <code>--bootstrap-server sentry_kafka:9092</code>. If it doesn't work, ensure you're on the same network.
If you write a kafka wrapper syntax could be <code>kafka [network] <kafka server> <command></code> perhaps?
Also, if this error occurs often, we could automate it, discussion for that at [https://devcentral.nasqueron.org/T1816 T1816]
=== Vault ===
Containers based on Sentry needs access to Vault to operate. That means 3 things:
# Private network works, ie Complector can be reached from Sentry host machine to 172.27.27.7
# Vault is up and unsealed
# Certificate is valid
For example, if the Vault certificate is expired, you can have the following failure in the log:
<syntaxhighlight lang="console">
$ docker logs sentry_web
!! Configuration error: ConfigurationError("SSLError: HTTPSConnectionPool(host='172.27.27.7', port=8200): Max retries exceeded with url: /v1/auth/approle/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))")
</syntaxhighlight>

Latest revision as of 17:14, 13 January 2024

Sentry is deployed on docker-002 through the Docker PaaS.

The Sentry DSN or the humans UI can be reached at https://sentry.nasqueron.org

Command line

Sentry provides a command line interface.

To use it, log in to docker-002, then run sentry nasqueron <command>. It will then spawn a new container correctly linked to the database, SMTP server and PostgreSQL database to run it. TODO: update the sentry wrapper to use the sentry network and get the right environment instead.

If you've deployed another instance than for Nasqueron, replace it by the realm name.

The following commands are available in 23.3:

 cleanup       Delete a portion of trailing data based on creation date.
 config        Manage runtime config options.
 createuser    Create a new user.
 devserver     Starts a lightweight web server for development.
 devservices   Manage dependent development services required for Sentry.
 django        Execute Django subcommands.
 exec          Execute a script.
 execfile      Execute a script.
 export        Exports core metadata for the Sentry installation.
 files         Manage files from filestore.
 help          Show this message and exit.
 import        Imports data from a Sentry export.
 init          Initialize new configuration directory.
 killswitches  Manage killswitches for ingestion pipeline.
 migrations    Manage migrations.
 performance   Performance utilities
 permissions   Manage Permissions for Users.
 plugins       Manage Sentry plugins.
 queues        Manage Sentry queues.
 repair        Attempt to repair any invalid data.
 run           Run a service.
 sendmail      Sends emails from the default notification mail address.
 shell         Run a Python interactive interpreter.
 spans         Span utilities
 start         DEPRECATED see `sentry run` instead.
 tsdb          Tools for interacting with the time series database.
 upgrade       Perform any pending database migrations and upgrades.
 write-hashes  Runs span hash grouping on event data in the supplied filename using the...

Kafka and ZooKeeper

  • Deployment pitfall: ZooKeeper wants a clean log if you initialise it, if not it will shutdown. Ensure it has always a data volume or an empty log when starting it.
  • Kafka has a 20 seconds pause before initializing the topics, to let the cluster correctly start.

To reprovision Kafka and ZooKeeper from scratch, the following works fine:

$ rm -rf /srv/kafka/sentry_kafka /srv/zookeeper/sentry_zookeeper
$ deploy-container zookeeper
$ deploy-container kafka

How to upgrade Sentry?

On Docker Hub or through Dwellers and our local registry, build a new nasqueron/sentry image.

On the Docker engine, perform database migration first, then upgrade the container:

  1. Pull the new image: docker pull nasqueron/sentry
  2. Run migrations: sentry nasqueron upgrade (should we stop Sentry containers first?)
  3. Deploy containers: deploy-container sentry

Deployment of Sentry containers

Sentry containers are currently deployed to docker-002. Pillar configuration and containers definitions are based on the getsentry/self-hosted repository.

The containers ARE NOT directly deployed USING the getsentry/self-hosted repository or the scripts it provides. Containers are regularly managed like any other container through Salt in paas-docker role. Instead, the getsentry/self-hosted repository is used as reference implementation documentation and any change needs to be reflected to our own configuration.

Correspondance repositories
rOPS getsentry/self-hosted Sentry version
D2907 4d318761a84c  23.4.0.dev
D2881 06662c6cdc85  23.2.0

Non-features

The following features can't currently be deployed:

Troubleshoot

Resources

A troubleshoot page exists in Sentry dev documentation, we followed successfully the out-of-sync Kafka procedure for example: https://develop.sentry.dev/self-hosted/troubleshooting/

Relay

Relay Rust DNS resolution library has a strange issue:

2023-03-27T14:29:18Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: error sending request for url (http://sentry_web:9000/api/0/relays/register/challenge/): error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: error trying to connect: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: dns error: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })
  caused by: proto error: Label contains invalid characters: Err(Errors { invalid_mapping, disallowed_by_std3_ascii_rules })

As a workaround, put the sentry_web IP (docker inspect sentry_web | grep 172 to get it) in /srv/relay/sentry_relay/config.yml file under relay.upstream key.

If you've this message in the log, the sentry_web container is down or more probably its IP has changed and you also need to update it:

2023-03-27T14:14:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
  caused by: error sending request for url (http://172.18.3.19:9000/api/0/relays/register/challenge/): error trying to connect: tcp connect error: Connection refused (os error 111)
  caused by: error trying to connect: tcp connect error: Connection refused (os error 111)
  caused by: tcp connect error: Connection refused (os error 111)
  caused by: Connection refused (os error 111)

Note DNS resolution works, even in Rust code, enjoy that one log entry:

2023-03-27T14:36:43Z [rdkafka::client] ERROR: librdkafka: FAIL [thrd:sentry_kafka:9092/bootstrap]: sentry_kafka:9092/1001: Connect to ipv4#172.18.3.8:9092 failed: Connection refused (after 0ms in state CONNECT, 1 identical error(s) suppressed)

Kafka

After a memory issue on docker-002 and a hard reboot, the sentry_post_process_forwarder_transactions and sentry_post_process_forwarder_errors containers exited on start with the following log message:

arroyo.errors.OffsetOutOfRange: KafkaError{code=_AUTO_OFFSET_RESET,val=-140,str="fetch failed due to requested offset not available on the broker: Broker: Offset out of range (broker 1001)"}

The out of sync procedure in Sentry troubleshooting doc works fine to solve that.

As we don't have a kafka wrapper, you can run commands directly on a Kafka instance, or spawn a new troubleshoot container on the same network. Example of the first solution:

$ docker exec -it sentry_kafka bash
$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe

Consumer group 'snuba-post-processor' has no active members.

GROUP                TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
snuba-post-processor events          0          43              44              1               -               -               -
snuba-post-processor transactions    0          467             476             9               -               -               -

$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic events --reset-offsets --to-latest --execute

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-post-processor           events                         0          44

$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor --topic transactions --reset-offsets --to-latest --execute

GROUP                          TOPIC                          PARTITION  NEW-OFFSET
snuba-post-processor           transactions                   0          476

$ kafka-consumer-groups --bootstrap-server localhost:9092 --group snuba-post-processor -describe

Consumer group 'snuba-post-processor' has no active members.

GROUP                TOPIC           PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID     HOST            CLIENT-ID
snuba-post-processor events          0          44              44              0               -               -               -
snuba-post-processor transactions    0          476             476             0               -               -               -

In the second case, you'll need to replace --bootstrap-server localhost:9092 by the correct way to reach a Kafka server, for example --bootstrap-server sentry_kafka:9092. If it doesn't work, ensure you're on the same network.

If you write a kafka wrapper syntax could be kafka [network] <kafka server> <command> perhaps?

Also, if this error occurs often, we could automate it, discussion for that at T1816

Vault

Containers based on Sentry needs access to Vault to operate. That means 3 things:

  1. Private network works, ie Complector can be reached from Sentry host machine to 172.27.27.7
  2. Vault is up and unsealed
  3. Certificate is valid

For example, if the Vault certificate is expired, you can have the following failure in the log:

$ docker logs sentry_web
!! Configuration error: ConfigurationError("SSLError: HTTPSConnectionPool(host='172.27.27.7', port=8200): Max retries exceeded with url: /v1/auth/approle/login (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))")