A vulnerability in the Linux kernel has been resolved, which revolves around the IPv6 routing subsystem. The issue, designated as CVE-2024-56703, was causing soft lockups in the fib6_select_path function under conditions of high next hop churn. This vulnerability was particularly evident in Linux-based edge routers operating in highly dynamic environments, leading to system panics.

The problem was in the traversal of the multipath circular linked-list in the fib6_select_path function within the Linux kernel. Specifically, it manifested during the iteration through the siblings in the list, when the nodes were unexpectedly deleted concurrently on a different core. This led to the 'next' and 'previous' elements of the nodes pointing back to the node itself, reference counts dropping to zero, and causing an infinite loop. Eventually, this resulted in a soft lockup and a system panic courtesy of the watchdog timer.

The resolution to this issue entailed the application of RCU primitives to the problematic code sections. This approach involved updating the references to fib6_siblings, annotating them, and using the RCU APIs where required.

A test script was utilized to reproduce the issue, which involved periodically updating the routing table while generating a heavy load of outgoing IPv6 traffic using multiple iperf3 clients. The script successfully induced infinite soft lockups within minutes consistently.

Original References

- Linux Kernel Mailing List: https://lkml.org/lkml/2024/3/1/578
- Git Commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=02316bd78c5a346280acb3a45ba2496fd195dc1e

Sample code snippet from the patch

  static struct fib6_node *fib6_select_path(struct fib6_node *fn,
                      int oif, u32 metric, int strict)
  {
    struct fib6_info *iter;
    struct fib6_info *sibling, *next_sibling;
    struct fib6_info *first_sibling;

    rcu_read_lock();
    first_sibling = rcu_dereference_protected(fn->leaf,
                          lockdep_is_held(&fib6_main_lock));
    for (sibling = first_sibling;
         sibling;
         sibling = rcu_dereference_protected(sibling->fib6_next,
                             lockdep_is_held(&fib6_main_lock))) {
        ...
    }
    ...
    rcu_read_unlock();

    return fn;
  }

Having the CVE-2024-56703 vulnerability fixed in the Linux kernel mitigates the risk of soft lockup instances in edge routers operating in dynamic environments. This improves the stability and reliability of the affected systems.

Timeline

Published on: 12/28/2024 10:15:18 UTC
Last modified on: 02/02/2025 11:15:12 UTC