A vulnerability has been recently resolved in the Linux kernel, specifically in the locking/qrwlock part of the code. This blog post will delve into the technical details of the issue, provide code snippets illustrating the problem, and shed light on how to address it. For those who want to explore the original references, you can find them in the Linux kernel git repository.

The Vulnerability: Lock Ordering Issue in queued_write_lock_slowpath()

The vulnerability in question lies in the queued_write_lock_slowpath() function. While this code is executed with the wait_lock held, it is possible for a reader to acquire the lock without holding wait_lock. This creates an A-B-A problem wherein values can be observed speculatively before the write lock is entirely acquired.

The code snippet below demonstrates the problem, where a writer calls the ep_scan_ready_list() function while a reader is concurrently executing read_lock_irqsave() and read_unlock_irqrestore():

Writer                                | Reader
ep_scan_ready_list()                  |
|- write_lock_irq()                   |
    |- queued_write_lock_slowpath()   |
    |- atomic_cond_read_acquire()     | read_lock_irqsave(&ep->lock, flags);
                                      | chain_epi_lockless()
                                      | epi->next = xchg(&ep->ovflist, epi);
                                      | read_unlock_irqrestore(&ep->lock, flags);

atomic_cmpxchg_relaxed()              |
READ_ONCE(ep->ovflist);

In this scenario, a core can order the read of ovflist ahead of the atomic_cmpxchg_relaxed(). As a result, the writer might observe a value change out from under it when it should not.

The Fix: Switching CMPXCHG to Acquire Semantics

Thankfully, there's a solution to this issue. By switching the cmpxchg to use acquire semantics, it will address the lock ordering problem. Consequently, the atomic_cond_read can be switched to use relaxed semantics. In addition to this change, the fix also employs try_cmpxchg() as suggested by Peter Zijlstra (peterz).

Here's a representation of the updated code

queued_write_lock_slowpath()
{
    ...
    atomic_cond_read_acquire(); --> atomic_cond_read_relaxed();
    ...
    atomic_cmpxchg_relaxed(); --> atomic_try_cmpxchg_acquire();
    ...
}

In conclusion, the CVE-2021-46921 vulnerability has been resolved with the introduction of proper acquire semantics in queued_write_lock_slowpath(). By making these adjustments in the Linux kernel, developers can ensure that the lock ordering issue is effectively mitigated.

For more information on this fix and other Linux kernel updates, refer to the official Linux kernel documentation.

Timeline

Published on: 02/27/2024 10:15:06 UTC
Last modified on: 04/10/2024 13:39:36 UTC