Creation GRE tunnel: Difference between revisions

From Nasqueron Agora
Yousra (talk | contribs)
Created page with " = Incident = While experimenting with creating a GRE tunnel between two router-001 and windirver already connected via GRE, the goal was to set up an additional tunnel for redundancy testing. What happened: * Creating a GRE tunnel on top of an existing GRE tunnel caused a cut connection between router-001 and windriver. * Direct access to the router was unavailable, and the network interface (netif) was restarted without restoring the routing table. * As a result, th..."
 
Yousra (talk | contribs)
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:


= Incident =
== Introduction ==


While experimenting with creating a GRE tunnel between two router-001 and windirver already connected via GRE, the goal was to set up an additional tunnel for redundancy testing.
A GRE tunnel was already established between router-001 and windriver. The objective was to create additional GRE tunnels for redundancy testing:
 
- Create a GRE tunnel between router-002 and windriver
 
- Create a GRE tunnel between router-003 and windriver
 
== Incidents ==


What happened:
What happened:


* Creating a GRE tunnel on top of an existing GRE tunnel caused a cut connection between router-001 and windriver.
* Creating a GRE tunnel on top of an existing GRE tunnel caused a cut connection between router-001 and windriver, and direct access to the router was unavailable,  
* Direct access to the router was unavailable, and the network interface (netif) was restarted without restoring the routing table.
* The network interface (natif) in windriver was restarted without restoring the routing table.
* As a result, the server did not restart properly, and network services were unavailable.
 
* Access via KVM was required to correct the routing table directly on the remote machine.
As a result, the server windriver did not restart properly, and network services were unavailable.
 


Actions taken:
Actions taken:
Line 15: Line 22:
* Accessed the server via KVM to restore the routing table.
* Accessed the server via KVM to restore the routing table.
* Performed a controlled network interface restart.
* Performed a controlled network interface restart.
* Established a rule: do not create GRE-over-GRE in the current infrastructure to avoid recursion and outages, if mandatory, always have KVM prepared and the VM itself to prevent any long server outage.


Technical analysis:
 
'''Okey but, how about the link between router-001 and windriver that was broken ?'''
 
Technical analysis on windriver:


dmesg showed the following error:
dmesg showed the following error:
'''gre0: if_output recursively called too many times''' --> this was the cause of the cut connections between router-001 and windriver.  
'''gre0: if_output recursively called too many times''' --> this was the cause of the cut connections between router-001 and windriver.  
this error comes from the FreeBSD kernel file sys/net/if.c, in the function if_tunnel_check_nesting().
this error comes from the FreeBSD kernel file sys/net/if.c, in the function if_tunnel_check_nesting().
* When creating a GRE tunnel between two machines and this one doesn't respond or cuts connectiong, proceed to destory the tunnel on both sides to allow traffic back.


Recommandations :
 
*avoid stacking GRE tunnels on top of existing ones, to avoid overcomplicating the configuration.
Step by step:
*test changes while having access to KVM.
 
*if restart is needed, always restart Netif along with the routing table with one command using "&&"
1. A packet needs to reach windriver.
 
2. The system checks the routing table.
 
3. It says: “Send it via gre0.”
 
4. gre0 encapsulates the packet.
 
5. But to send that encapsulated packet out… The system again decides to use gre0.
 
6. gre0 encapsulates it again.
 
7. And again.
 
8. And again.
 
== Recommandations ==
 
* Changes the network is ok while having access to KVM : always have KVM prepared and the VM itself to prevent any long server routage.
 
*if restart is needed, always restart natif along with the routing table with one command using "&&"
 
== Alternative Design ==
 
We will then create a GRE tunnel between router-002 and router-003 to test the creation of a GRE tunnel with IPsec.
 
The recursion issue will be temporarily set aside and revisited later, once we are confident that the overall configuration is stable and working as expected.
 
And if everything goes well, we can automate the configurations with Salt and deploy the code to windriver and the two routers.
 
 
 
Linked to {{T|2201}}

Latest revision as of 18:24, 17 February 2026

Introduction

A GRE tunnel was already established between router-001 and windriver. The objective was to create additional GRE tunnels for redundancy testing:

- Create a GRE tunnel between router-002 and windriver
- Create a GRE tunnel between router-003 and windriver

Incidents

What happened:

  • Creating a GRE tunnel on top of an existing GRE tunnel caused a cut connection between router-001 and windriver, and direct access to the router was unavailable,
  • The network interface (natif) in windriver was restarted without restoring the routing table.

As a result, the server windriver did not restart properly, and network services were unavailable.


Actions taken:

  • Accessed the server via KVM to restore the routing table.
  • Performed a controlled network interface restart.


Okey but, how about the link between router-001 and windriver that was broken ?

Technical analysis on windriver:

dmesg showed the following error: gre0: if_output recursively called too many times --> this was the cause of the cut connections between router-001 and windriver. this error comes from the FreeBSD kernel file sys/net/if.c, in the function if_tunnel_check_nesting().


Step by step:

1. A packet needs to reach windriver.
2. The system checks the routing table.
3. It says: “Send it via gre0.”
4. gre0 encapsulates the packet.
5. But to send that encapsulated packet out… The system again decides to use gre0.
6. gre0 encapsulates it again.
7. And again.
8. And again.

Recommandations

  • Changes the network is ok while having access to KVM : always have KVM prepared and the VM itself to prevent any long server routage.
  • if restart is needed, always restart natif along with the routing table with one command using "&&"

Alternative Design

We will then create a GRE tunnel between router-002 and router-003 to test the creation of a GRE tunnel with IPsec.

The recursion issue will be temporarily set aside and revisited later, once we are confident that the overall configuration is stable and working as expected.

And if everything goes well, we can automate the configurations with Salt and deploy the code to windriver and the two routers.


Linked to T2201