Creation GRE tunnel: Difference between revisions
From Nasqueron Agora
| Line 13: | Line 13: | ||
* Creating a GRE tunnel on top of an existing GRE tunnel caused a cut connection between router-001 and windriver. | * Creating a GRE tunnel on top of an existing GRE tunnel caused a cut connection between router-001 and windriver. | ||
* Direct access to the router was unavailable, and the network interface (natif) was restarted without restoring the routing table. | * Direct access to the router was unavailable, and the network interface (natif) in windriver was restarted without restoring the routing table. | ||
As a result, the server windriver did not restart properly, and network services were unavailable. | As a result, the server windriver did not restart properly, and network services were unavailable. | ||
Revision as of 13:07, 17 February 2026
Introduction
A GRE tunnel was already established between router-001 and windriver. The objective was to create additional GRE tunnels for redundancy testing:
- Create a GRE tunnel between router-002 and windriver
- Create a GRE tunnel between router-003 and windriver
Incidents
What happened:
- Creating a GRE tunnel on top of an existing GRE tunnel caused a cut connection between router-001 and windriver.
- Direct access to the router was unavailable, and the network interface (natif) in windriver was restarted without restoring the routing table.
As a result, the server windriver did not restart properly, and network services were unavailable.
Access via KVM was required to correct the routing table directly on the remote machine.
Actions taken:
- Accessed the server via KVM to restore the routing table.
- Performed a controlled network interface restart.
- Established a rule: do not create GRE-over-GRE in the current infrastructure to avoid recursion and routages, if mandatory, always have KVM prepared and the VM itself to prevent any long server outage.
Technical analysis:
dmesg showed the following error: gre0: if_output recursively called too many times --> this was the cause of the cut connections between router-001 and windriver. this error comes from the FreeBSD kernel file sys/net/if.c, in the function if_tunnel_check_nesting().
- The problem is that the traffic from the new tunnel was used to pass through the existing tunnel ==> loop
Where the loop happens ?
Step by step:
1. A packet needs to reach windriver.
2. The system checks the routing table.
3. It says: “Send it via gre0.”
4. gre0 encapsulates the packet.
5. But to send that encapsulated packet out… The system again decides to use gre0.
6. gre0 encapsulates it again.
7. And again.
8. And again.
Recommandations
- avoid stacking GRE tunnels on top of existing ones, to avoid overcomplicating the configuration.
- test changes while having access to KVM.
- if restart is needed, always restart natif along with the routing table with one command using "&&"
