Operations grimoire/Anubis: Difference between revisions
Created page with "'''Anubis''' is a proxy to filter out AI scrapers requests. It will allow sites like DevCentral to stop to serve heavy traffic for LLM model training. We're currently deploying it to our infrastructure == Challenges == We want to stop scraping traffic, but let legitimate traffic pass (e.g. CLI requests from Conduit). Anubis is used by a lot of similar software forges from open-source projects: [https://anubis.techaro.lol/docs/user/known-instances Anubis known inst..." |
mNo edit summary |
||
| (3 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
'''Anubis''' is a proxy to filter out AI scrapers requests. It will allow sites like [[DevCentral]] to stop to serve heavy traffic for LLM model training. | '''Anubis''' is a proxy to filter out AI scrapers requests. It will allow sites like [[Operations grimoire/DevCentral|DevCentral]] to stop to serve heavy traffic for LLM model training. | ||
__TOC__ | |||
== Challenges == | == Challenges == | ||
| Line 8: | Line 8: | ||
Anubis is used by a lot of similar software forges from open-source projects: [https://anubis.techaro.lol/docs/user/known-instances Anubis known instances], but it's unclear if someone already succeed to configure it for Phabricator. | Anubis is used by a lot of similar software forges from open-source projects: [https://anubis.techaro.lol/docs/user/known-instances Anubis known instances], but it's unclear if someone already succeed to configure it for Phabricator. | ||
It's our current field of investigation by | It's our current field of investigation by Gui and Dereckson. | ||
At Wikimedia, Andrew Kostka (WMDE) has some experience deploying it, apparently for Wikibase products. | At Wikimedia, Andrew Kostka (WMDE) has some experience deploying it, apparently for Wikibase products. | ||
== Implementation (by Gui) == | |||
=== Isolation via UNIX Sockets === | |||
To avoid to maintain a table of TCP ports, each instance is configured to use UNIX sockets for both traffic and metrics: | |||
* '''Main socket:''' <code><nowiki>/run/anubis/{{ instance }}.sock</nowiki></code> | |||
* '''Metrics socket:''' <code><nowiki>/run/anubis/{{ instance }}-metrics.sock</nowiki></code> | |||
=== Dynamic Configuration === | |||
The Salt states resolve the target application's port from the `docker_containers` pillar. In `anubis_instances`, we define the target: | |||
<pre> | |||
anubis_instances: | |||
devcentral: | |||
target: | |||
service: phabricator | |||
container: devcentral | |||
</pre> | |||
=== Provisioning and Vault === | |||
Anubis API and Dashboard keys are provisioned in '''Vault''' using a helper script [scripts/fix_anubis_devcentral.sh](cci:7://file:///C:/Users/Gui%20Martinien/Downloads/operations-main/operations-main/scripts/fix_anubis_devcentral.sh:0:0-0:0) and retrieved by Salt during deployment. | |||
== Current work == | == Current work == | ||
* | * Gui deployed an experimental run manually on Dwellers to figure how we can build and configure Anubis | ||
* Work to deploy Anubis with Salt: [https://devcentral.nasqueron.org/D3908 D3908] | * Work to deploy Anubis with Salt: [https://devcentral.nasqueron.org/D3908 D3908] | ||
| Line 20: | Line 39: | ||
If you serve to protect two different domains, you need two Anubis instances, one per domain. | If you serve to protect two different domains, you need two Anubis instances, one per domain. | ||
In December, we noticed strange errors about challenges not found trying to fire the same instance for two different targets | In December, we noticed strange errors about challenges not found trying to fire the same instance for two different targets from the same domain. | ||
== See also == | == See also == | ||
[https://anubis.techaro.lol/ Anubis website] | * [https://anubis.techaro.lol/ Anubis website] | ||
Latest revision as of 17:30, 9 April 2026
Anubis is a proxy to filter out AI scrapers requests. It will allow sites like DevCentral to stop to serve heavy traffic for LLM model training.
Challenges
We want to stop scraping traffic, but let legitimate traffic pass (e.g. CLI requests from Conduit).
Anubis is used by a lot of similar software forges from open-source projects: Anubis known instances, but it's unclear if someone already succeed to configure it for Phabricator.
It's our current field of investigation by Gui and Dereckson.
At Wikimedia, Andrew Kostka (WMDE) has some experience deploying it, apparently for Wikibase products.
Implementation (by Gui)
Isolation via UNIX Sockets
To avoid to maintain a table of TCP ports, each instance is configured to use UNIX sockets for both traffic and metrics:
- Main socket:
/run/anubis/{{ instance }}.sock - Metrics socket:
/run/anubis/{{ instance }}-metrics.sock
Dynamic Configuration
The Salt states resolve the target application's port from the `docker_containers` pillar. In `anubis_instances`, we define the target:
anubis_instances:
devcentral:
target:
service: phabricator
container: devcentral
Provisioning and Vault
Anubis API and Dashboard keys are provisioned in Vault using a helper script [scripts/fix_anubis_devcentral.sh](cci:7://file:///C:/Users/Gui%20Martinien/Downloads/operations-main/operations-main/scripts/fix_anubis_devcentral.sh:0:0-0:0) and retrieved by Salt during deployment.
Current work
- Gui deployed an experimental run manually on Dwellers to figure how we can build and configure Anubis
- Work to deploy Anubis with Salt: D3908
Troubleshoot
Each site must have its own Anubis instance
If you serve to protect two different domains, you need two Anubis instances, one per domain.
In December, we noticed strange errors about challenges not found trying to fire the same instance for two different targets from the same domain.
