Operations grimoire/NetBox: Difference between revisions

From Nasqueron Agora
 
(One intermediate revision by the same user not shown)
Line 64: Line 64:


Documentation needs to be generated before collectstatic. If not, run again the collectstatic command to copy docs from project-static to static directory.
Documentation needs to be generated before collectstatic. If not, run again the collectstatic command to copy docs from project-static to static directory.
== Troubleshoot ==
=== Grafana ===
Use [https://grafana.nasqueron.org/d/O6v4rMpizda/netbox?orgId=1&refresh=10s Grafana dashboard] to check statistics.
As of 2024-08-04, all the rates are erroneous and (a lot) over-evaluated.
=== 500 after server reboot ===
After the server is restarted, NetBox needs PostgreSQL and Redis to work.
* If PostgreSQL is down, we don't have any message on the web UI, it serves directly a 500 error code
* If PostgreSQL up, Redis down, an explicit error message tells us it can't connect to Redis
* When both services are up, site should work again


== Conventions ==
== Conventions ==

Latest revision as of 16:10, 4 August 2024

NetBox is the current source of truth for the network. It's available at NetBox.

Integration with Salt

Ongoing integration projects are in work to serve NetBox data as part of Salt pillar for configuration as code.

It should serve:

  • /etc/hosts data for Drake network machines
  • the same information available at pillar/nodes/nodes.sls, so we won't need to maintain that anymore

Integration with IRC

Export to a database for Odderon is discussed at T1868.

That should allow the following workflow:

   17:13:59 < Dereckson> 172.27.27.6
   17:14:00 < Odderon> (Dereckson): 172.27.27.6/28 docker-001.nasqueron.org / Decommissioned address for docker-001.nasqueron.org / Drake [deprecated]

Source code of the translation code is available at https://devcentral.nasqueron.org/source/netbox-darkbot-db/

Backup

To backup the PostgreSQL database:

   sudo -u postgres pg_dump netbox | gzip > netbox-$(date +"%Y-%m-%d").gz

Database is currently hosted on WindRiver, not on DB-A, even if a user exists there and an outdated copy exists.

Upgrade

To upgrade NetBox:

  1. Backup database on WindRiver
  2. Install the new version
  3. Copy configuration
  4. Update Python dependencies
  5. Shutdown the service: service netbox stop or if pid is wrong killall netbox
  6. Symlink the directory in /srv/netbox
  7. Run the database migrations and regenerate static resources, etc. as documented in upgrade.sh
  8. Start the service

An example for upgrade to 3.7.0:

   sudo -u postgres pg_dump netbox | gzip > netbox-$(date +"%Y-%m-%d").gz
   cd /srv/netbox
   wget https://github.com/netbox-community/netbox/archive/refs/tags/v3.7.0.tar.gz
   tar xzf v3.7.0.tar.gz
   cp /srv/netbox/netbox/netbox/netbox/configuration.py netbox-3.7.0/netbox/netbox/
   source venv/bin/activate
   pip install -r netbox-3.7.0/requirements.txt
   killall -u netbox
   rm netbox
   ln -s netbox-3.7.0 netbox
   cd /srv/netbox/netbox
   mkdocs build
   cd /srv/netbox/netbox/netbox
   python3 manage.py migrate
   python3 manage.py collectstatic
   python3 manage.py remove_stale_contenttypes
   python3 manage.py reindex --lazy
   python3 manage.py clearsessions
   service netbox start

To track configuration changes, you can use https://github.com/netbox-community/netbox/commits/develop/netbox/netbox/configuration_example.py

If psycopg can't build, you can build it with pip install "psycopg[c]". That requires development headers for PostgreSQL, available on all FreeBSD servers where PostgreSQL is installed ; for a Docker image based on Linux, search for a libpq-dev or libpq-devel package for that purpose.

Documentation needs to be generated before collectstatic. If not, run again the collectstatic command to copy docs from project-static to static directory.

Troubleshoot

Grafana

Use Grafana dashboard to check statistics.

As of 2024-08-04, all the rates are erroneous and (a lot) over-evaluated.

500 after server reboot

After the server is restarted, NetBox needs PostgreSQL and Redis to work.

  • If PostgreSQL is down, we don't have any message on the web UI, it serves directly a 500 error code
  • If PostgreSQL up, Redis down, an explicit error message tells us it can't connect to Redis
  • When both services are up, site should work again

Conventions

Devices

Device role colors

  • Network appliances and assimilated (e.g. machines with router role): gray
  • Devices for human to connect to and work from (e.g. shellserver, devserver): purple
  • Devices hosting services (e.g. dbserver, paas-docker): pink