For each route in the BGP table, the next hop has to exist and has to be reachable. If not, the route can’t be used. BGP uses a scanner that checks all routes in the BGP table every 60 seconds. The BGP scanner does best path calculation, checks the next hop addresses, and if the next hops are reachable.

60 seconds is a long time. When something happens with a next hop during the 60 seconds between two scans, we have to wait for the next scan to start before problems are resolved. Meanwhile, we can have black holes and/or routing loops.

BGP next hop tracking is a feature that reduces the BGP convergence time by monitoring BGP next hop address changes in the routing table. It’s event-based because it detects changes in the routing table. When it detects a change, it schedules a next hop scan to adjust the next hop in the BGP table.

After detecting a change, the next hop scan has a default delay of 5 seconds. Next hop tracking also supports dampening penalties. This increases the delay of the next hop scan for next hop addresses that keep changing in the routing table.

In this lesson, I’ll show you what the BGP next hop scanner looks like and how dampening works.

Configuration


We use the following topology:

Ibgp R1 R2 R3 As 123

We have three routers in AS 123 running iBGP. Each router has a loopback interface with an IP address that we advertise in OSPF. Those IP addresses are used by IGP as the next hop addresses. In between R2 and R3, we have the 192.168.23.0/24 network that we will advertise in BGP.

  • Configurations
  • R1
  • R2
  • R3

Want to take a look for yourself? Here you will find the startup configuration of each device.

As explained earlier, BGP has a scanner that runs every 60 seconds. If you have never seen it before, it’s interesting to take a look at. You can see that it runs every 60 seconds if you enable the following debug:

R1#debug ip bgp
BGP debugging is on for address family: IPv4 Unicast 

You will see the following messages:

R1#
*Apr  9 09:56:53.743: BGP: topo global:IPv4 Unicast:base Scanning routing tables
*Apr  9 09:56:53.744: BGP: topo global:IPv4 Multicast:base Scanning routing tables
*Apr  9 09:56:53.746: BGP: topo global:L2VPN E-VPN:base Scanning routing tables
*Apr  9 09:56:53.747: BGP: topo global:MVPNv4 Unicast:base Scanning routing tables

*Apr  9 09:57:53.754: BGP: topo global:IPv4 Unicast:base Scanning routing tables
*Apr  9 09:57:53.757: BGP: topo global:IPv4 Multicast:base Scanning routing tables
*Apr  9 09:57:53.758: BGP: topo global:L2VPN E-VPN:base Scanning routing tables
*Apr  9 09:57:53.759: BGP: topo global:MVPNv4 Unicast:base Scanning routing tables

I left the timestamps so that you can see it runs every 60 seconds. The BGP scanner is a bit too slow to rely on for next hop changes. Let’s see how next hop tracking works!

Next hop tracking is enabled by default so it’s not something we have to configure. You can see the two commands here:

R1#show run all | include nexthop trigger
 bgp nexthop trigger enable
 bgp nexthop trigger delay 5

You can disable it by adding no to the first command. The only value we can change is the delay for when the next hop scanner starts (5 seconds).

We want to see next hop tracking in action so let’s enable the following two debugs:

R1#debug ip routing
IP routing debugging is on

R1#debug ip bgp events nexthop 
BGP nexthop events debugging is on

The first debug is useful to see changes to the routing table. The second debug shows what next hop tracking will do.

Here’s the current BGP table:

R1#show ip bgp
BGP table version is 19, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 * i  192.168.23.0     2.2.2.2                  0    100      0 i
 *>i                   3.3.3.3                  0    100      0 i

The 192.168.23.0/24 network is advertised by both R2 and R3 but we use the path through R3. Let’s shut the loopback interface of R3 to see what happens:

R3(config)#interface Loopback 0
R3(config-if)#shutdown

Here’s what we get:

R1#
RT: del 3.3.3.3 via 192.168.123.3, ospf metric [110/2]
RT: delete subnet route to 3.3.3.3/32
EvD: accum. penalty decayed to 0 after 67 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:19, 19000 , scheduling nexthop scan in 5 secs
BGP: BGP Event nhop timer
BGP: tbl IPv4 Unicast:base Nexthop walk
BGP(IPv4 Unicast): CHANGED Path metric 0 Path aigp-metric 0 nexthop: 3.3.3.3
RT: updating bgp 192.168.23.0/24 (0x0)  :
    via 2.2.2.2   0 1048577

RT: closer admin distance for 192.168.23.0, flushing 1 routes
RT: add 192.168.23.0/24 via 2.2.2.2, bgp metric [200/0]

As soon as OSPF figures out that 3.3.3.3/32 is gone, the route is deleted from the routing table. Immediately after the OSPF event, you can see that BGP schedules the next hop scanner in 5 seconds.

Once those 5 seconds have expired, it changes the next hop address to 2.2.2.2 (R2) and adds this change to the routing table. This process is much faster than the BGP scanner that runs every 60 seconds.

Here’s what the BGP table now looks like:

R1#show ip bgp
BGP table version is 22, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i  192.168.23.0     2.2.2.2                  0    100      0 i
 * i                   3.3.3.3                  0    100      0 i

As you can see above, we now use 2.2.2.2 to get to 192.168.23.0/24.

We still see 3.3.3.3 as a next hop in the BGP table. That’s because the BGP hold-down timer hasn’t expired yet.

We can also verify this in the routing table:

R1#show ip route 192.168.23.0
Routing entry for 192.168.23.0/24
  Known via "bgp 123", distance 200, metric 0, type internal
  Last update from 2.2.2.2 00:00:32 ago
  Routing Descriptor Blocks:
  * 2.2.2.2, from 2.2.2.2, 00:00:32 ago
      Route metric is 0, traffic share count is 1
      AS Hops 0
      MPLS label: none

Let’s try one more thing. Let’s shut the loopback 0 interface of R2 so that next hop address 2.2.2.2 is invalid. Here’s the BGP table right now:

R1#show ip bgp
BGP table version is 22, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i  192.168.23.0     2.2.2.2                  0    100      0 i

Let’s shut the loopback 0 interface of R2:

R2(config)#interface Loopback 0
R2(config-if)#shutdown

Take a look at the new debug messages:

R1#
RT: del 2.2.2.2 via 192.168.123.2, ospf metric [110/2]
RT: delete subnet route to 2.2.2.2/32
EvD: accum. penalty decayed to 0 after 208 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:19, 19000 , scheduling nexthop scan in 5 secs
BGP: BGP Event nhop timer
BGP: tbl IPv4 Unicast:base Nexthop walk
BGP(IPv4 Unicast): CHANGED Path metric 0 Path aigp-metric 0 nexthop: 2.2.2.2
RT: del 192.168.23.0 via 2.2.2.2, bgp metric [200/0]
RT: delete network route to 192.168.23.0/24

OSPF detects the change and 2.2.2.2/32 is deleted from the routing table. Right after, BGP schedules a next hop scan in 5 seconds and when the timer expires, it deletes the 192.168.23.0/24 route from the routing table.

Here’s what our BGP table looks like now:

R1#show ip bgp
BGP table version is 24, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
              t secondary path, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 * i  192.168.23.0     2.2.2.2                  0    100      0 i

The entry is still there because our iBGP hold-down timer hasn’t expired yet but the network is no longer installed. We can verify this by checking the routing table:

R1#show ip route 2.2.2.2
% Network not in table
R1#show ip route 192.168.23.0
% Network not in table

Dampening

You have now seen how next hop tracking is scheduled and runs after 5 seconds. What if we have a flapping network that causes the next hop to change over and over again? Right now, that means that the BGP table gets updated after 5 seconds over and over again.

To prevent this from happening, next hop tracking supports dampening.

Each time a next hop changes, a value of 500 is added to the penalty. When the penalty is below 950, the next hop scanner is scheduled in 5 seconds. This is what we just witnessed.

When the penalty is above 950, the next hop scanner is scheduled to when the penalty decreases to 100 or below.

The penalty decreases by half every 8 seconds. If the current penalty is 2000 then 8 seconds later, it will be 1000. Another 8 seconds later, it will be 500. These parameters cannot be configured.

We can test dampening by changing the next hop in the routing table over and over again. You could shut/unshut the loopback 0 interfaces of R2 or R3 a couple of times but then you need to wait until OSPF converges.

I’m going to add and remove a static route on R1 a couple of times. This is the quickest method to change the routing table and trigger next hop tracking.

I still have my debugs enabled but I will disable debug IP routing:

R1#no debug ip routing
IP routing debugging is off

Now I add and remove the following static route for next hop 2.2.2.2 a couple of times after another:

R1(config)#ip route 2.2.2.2 255.255.255.255 null 0
R1(config)#no ip route 2.2.2.2 255.255.255.255 null 0

Now take a look at the debug:

R1#
EvD: accum. penalty decayed to 0 after 127 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:19, 19000 , scheduling nexthop scan in 5 secs

The first time, our next hop scanner is scheduled in 5 seconds. This is normal. The next hop keeps “flapping” and we get the following debug messages:

R1#
EvD: accum. penalty decayed to 353 after 4 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:25, 25000 , timer already running
EvD: accum. penalty decayed to 853 after 0 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:30, 30000 , timer already running

BGP: BGP Event nhop timer
BGP: tbl IPv4 Unicast:base Nexthop walk

Above, you see the current penalty values (first 353, then 853) but nothing happens. This is because the next hop scanner has already been scheduled. Once those 5 seconds have expired, it runs.

The next hop keeps flapping so the penalty keeps increasing:

R1#
EvD: accum. penalty decayed to 956 after 4 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:31, 31000 , scheduling nexthop scan in 31 secs

Above, you see that the penalty was 956 and a bit later, the next hop scanner is scheduled to run in 31 seconds. That’s when the penalty is supposed to have a value of 100.

The next hop keeps flapping and we see the following debug messages:

R1#EvD: accum. penalty decayed to 944 after 5 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:31, 31000 , timer already running
EvD: accum. penalty decayed to 1444 after 0 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:35, 35000 , timer already running

EvD: accum. penalty decayed to 1374 after 4 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:34, 34000 , timer already running

EvD: accum. penalty decayed to 1445 after 3 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:35, 35000 , timer already running
EvD: accum. penalty decayed to 1945 after 0 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse  

EvD: accum. penalty decayed to 1885 after 3 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:37, 37000 , timer already running

EvD: accum. penalty decayed to 2005 after 2 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:38, 38000 , timer already running
EvD: accum. penalty decayed to 2505 after 0 second(s)
BGP(0): IPv4 Unicast::base nexthop modified, reuse in 00:00:40, 40000 , timer already running

Above, you can see that the penalty keeps increasing but nothing happens. We already scheduled the next hop scanner so right now, the penalty increased but that’s it.

Once those 31 seconds expire, the scanner runs:

R1#
BGP: BGP Event nhop timer
BGP: tbl IPv4 Unicast:base Nexthop walk

If the network keeps flapping, the penalty will rise higher and higher, and the next hop scanner will be delayed even more.

Conclusion

You have now learned how BGP next hop tracking works.

  • BGP has a BPG scanner that checks next hops and next hop reachability every 60 seconds.
  • When a next hop changes or fails in between two runs of the BGP scanner then we can have temporarily black holes or routing loops.
  • Next hop tracking improves BGP convergence time by checking changes to next hops in the routing table.
  • When a change is detected, the BGP next hop scanner is scheduled to run in 5 seconds. You can change this value.
  • Next hop tracking supports dampening:
    • Flapping networks get a penalty of 500 each time they flap.
    • The penalty is reduced by half every 8 seconds.
    • When the penalty is below 950, the next hop scanner is scheduled with the default value (5 seconds).
    • When the penalty is above 950, the next hop scanner is scheduled to run when the penalty has a value of 100.