Creation GRE tunnel
From Nasqueron Agora
Incident
While experimenting with creating a GRE tunnel between two router-001 and windirver already connected via GRE, the goal was to set up an additional tunnel for redundancy testing.
What happened:
- Creating a GRE tunnel on top of an existing GRE tunnel caused a cut connection between router-001 and windriver.
- Direct access to the router was unavailable, and the network interface (netif) was restarted without restoring the routing table.
- As a result, the server did not restart properly, and network services were unavailable.
- Access via KVM was required to correct the routing table directly on the remote machine.
Actions taken:
- Accessed the server via KVM to restore the routing table.
- Performed a controlled network interface restart.
- Established a rule: do not create GRE-over-GRE in the current infrastructure to avoid recursion and outages, if mandatory, always have KVM prepared and the VM itself to prevent any long server outage.
Technical analysis:
dmesg showed the following error: gre0: if_output recursively called too many times --> this was the cause of the cut connections between router-001 and windriver. this error comes from the FreeBSD kernel file sys/net/if.c, in the function if_tunnel_check_nesting().
- When creating a GRE tunnel between two machines and this one doesn't respond or cuts connectiong, proceed to destory the tunnel on both sides to allow traffic back.
Recommandations :
- avoid stacking GRE tunnels on top of existing ones, to avoid overcomplicating the configuration.
- test changes while having access to KVM.
- if restart is needed, always restart Netif along with the routing table with one command using "&&"
