In early 2021, a critical vulnerability was identified in the Linux kernel, affecting how queued read-write locks (qrwlock) handle synchronization between readers and writers. Tracked as CVE-2021-46921, this issue could potentially allow data races, leading to inconsistent data seen by different threads, particularly under heavy concurrency and specific interaction patterns (like in epoll). This post explains the issue, presents simple code snippets, and demonstrates why the bug posed a real threat.

The Problem in Plain Language

When you have several threads or processes reading and writing data at the same time, you rely on locking code to make sure no one reads data while someone else is writing to it (and vice versa). The Linux kernel uses a special lock called queued read-write lock (qrwlock). This lock lets many readers work at once, but only one writer can have exclusive access.

However, CVE-2021-46921 shows that the writer's locking code could accidentally allow a reader to change data between attempts to acquire the write lock. This creates a small window where both a reader and a writer think they can change the same data — leading to bugs that are very hard to find.

Where Did It Happen?

The bug was mainly observed in the epoll subsystem of the Linux kernel, specifically during a race between a reader using xchg (an atomic exchange operation) and a writer trying to grab the lock. This is illustrated as follows:

Writer                                     Reader
ep_scan_ready_list()                       
|- write_lock_irq()                        
   |- queued_write_lock_slowpath()         
      |- atomic_cond_read_acquire()        
                        (reader acquires lock here)
                                            read_lock_irqsave(&ep->lock, flags);
                                            chain_epi_lockless()
                                            epi->next = xchg(&ep->ovflist, epi);
                                            read_unlock_irqrestore(&ep->lock, flags);

      atomic_cmpxchg_relaxed()   (finally grabs write lock)
      READ_ONCE(ep->ovflist);    (reads possibly changed value)

The key issue: the writer checks if it can grab the lock, but before it really does, a reader may already change the data. When the writer finally does its work, the underlying data may have changed!

Before the fix, the kernel code looked like this

// old, buggy approach
if (atomic_cmpxchg_relaxed(&lock->cnts, old, new) == old) {
    // acquired write lock
}

Here, atomic_cmpxchg_relaxed() does not ensure proper memory ordering. This means other operations (like reading ep->ovflist) could happen *before* the write lock is really acquired.

The correct way is to *acquire* the lock with proper barriers, so other cores can't reorder memory accesses:

// fixed code - uses acquire semantics
if (atomic_try_cmpxchg_acquire(&lock->cnts, &old, new)) {
    // now the memory read/write order is correct
    // safe to access shared data
}

With this change, the kernel guarantees that once the write lock is acquired, all further reads/writes are correctly ordered and no speculative or out-of-order execution results in stale or incorrect data.

- CVE Details: https://nvd.nist.gov/vuln/detail/CVE-2021-46921

Linux Kernel Commit Fix:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a52faad5fbbc7f938e8e57aed7a6ba98ed1b32f

LKML Discussion:

https://lore.kernel.org/all/20211104165658.GW1757@worktop.programming.kicks-ass.net/

While there isn't a known public exploit, here’s how an attacker might take advantage

- Goal: Change a value in a way that a writer thinks it has exclusive access, but a reader sneaks in an update during the acquisition phase.
- Effect: The writer works on or sees data that's already changed unexpectedly, which could crash the system, corrupt data, or lead to privilege escalation in very precise attack scenarios.

A simulation in C

// Highly simplified and theoretical - not drop-in Linux code!
volatile int ep_ovflist = 1;
volatile int lock = ;

void reader() {
    int tmp;
    while (!lock);  // Wait for lock value to be nonzero (simulate reader lock)
    tmp = ep_ovflist;
    ep_ovflist = 2; // xchg-like operation
}

void writer() {
    // Try to get lock (simulate the race)
    // BAD: lock not yet ordered, reader could still update ep_ovflist here
    lock = 1;
    int value = ep_ovflist; // THIS MAY BE STALE/CHANGED
}

Conclusion

CVE-2021-46921 is a great example of why proper memory barriers and atomic operations are so important in kernel code. Even if all "lock" checks pass, sloppy ordering can let subtle, dangerous bugs slip through.

Further Reading

- Memory barriers and atomic operations in the Linux kernel (LWN)
- Linux Kernel Lock Documentation

Timeline

Published on: 02/27/2024 10:15:06 UTC
Last modified on: 04/10/2024 13:39:36 UTC