---

Summary

A critical vulnerability (CVE-2021-46957) was found in the Linux kernel, specifically when using kprobes on the RISC-V architecture. It led to a kernel panic (crash) whenever sys_read was traced by a kprobe. This issue was caused by how the kernel handled certain instructions and exceptions, ending up hitting a BUG_ON() in __find_get_block. The problem has been fixed, but knowing how it occurred and how it was patched is key for anyone maintaining Linux on RISC-V.

This post will break down what happened, how to reproduce it, show some code snippets and logs, link original references, and discuss the exploit/exposure details in clear, simple terms.

The Problem

When you try to trace the Linux sys_read system call with a kprobe on RISC-V, the system encounters an unrecoverable error and crashes with the following log:

[   65.708663] ------------[ cut here ]------------
[   65.709987] kernel BUG at fs/buffer.c:1251!
...
[   65.734613] status: 000000000000010 badaddr: 000000000000000 cause: 0000000000000003
[   65.734901] Call Trace:
[   65.735076] [<ffffffe00019f11e>] __find_get_block+x218/x2c8
[   65.735417] [<ffffffe00020017a>] __ext4_get_inode_loc+xb2/x2f6
...
[   65.738858] ---[ end trace fe93f985456c935d ]---

It happens every time you try tracing with a kprobe set at the entry of sys_read.

Set up a kprobe on sys_read like this (as root)

echo 'p:myprobe sys_read fd=%a buf=%a1 count=%a2' > /sys/kernel/debug/tracing/kprobe_events
echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
cat /sys/kernel/debug/tracing/trace

Probe Setup

- When you install a kprobe at the entry of sys_read, the kernel replaces the *first* instruction of that function with an ebreak instruction (RISC-V’s software breakpoint).

Trap on EBREAK

- When the CPU hits the ebreak, it jumps into the kernel’s breakpoint handler, which starts setting up for single-stepping the original instruction. This involves backing up the SSTATUS register and disabling interrupts.

Instruction Slot Handling

- The handler sets up a temporary “instruction slot” with the real instruction and another ebreak, preparing to simulate the original sys_read entry.
- If the memory for this “slot” isn’t mapped yet, the next access trips an *Instruction Page Fault*.

Page Fault Confusion

- The page fault handler sees the system is in a weird state—single-step mode, with interrupts off—and gets confused about how to restore or continue the kprobe.
- The confusion leads to states being reset at the wrong time or context, resulting in further misbehavior down the call stack.

BUG_ON Crash

- As a result of this mishandling, the kernel tries to access invalid or inconsistent memory, tripping a kernel BUG at fs/buffer.c:1251 inside __find_get_block (used by ext4 and others). That’s where the system halts to prevent data corruption.

Why Did This Happen?

- The RISC-V kprobe single-step and page fault handling was not properly coordinated. When a page fault happens in the middle of a kprobe-induced single-step, the kernel’s state machine is not prepared for the transition, leading to data structures (like those used for managing blocks/inodes) being accessed in an unexpected state, crashing the kernel.

- It's mostly triggered on RISC-V 64-bit Linux with kprobes enabled and an attempted kprobe on system calls like sys_read.

This is not a remote exploit—it cannot be triggered by a regular unprivileged user.

- Anyone with root access (or who can enable kprobes) *could* crash the system locally just by enabling this kprobe as shown above.

The Fix

The patch reworks how the RISC-V kprobe/state logic interacts with page fault handling. It ensures that the kprobe state is consistent even when a page fault interrupts the single-stepping process, preventing incorrect resets or restoration that lead to invalid memory access.

Simplified Patch Overview
(In real commit: kernel/git/torvalds/linux.git)

// Before, kprobe state was reset unconditionally on page fault during SS
if (kprobe_running() && kcb->kprobe_status == KPROBE_HIT_SS) {
    // reset state
}

// After, be careful only to reset when truly necessary
if (kprobe_running() && kcb->kprobe_status == KPROBE_HIT_SS &&
    exception_is_expected(...)) {
    // reset state
}

This avoids resetting the kernel’s kprobe context in unsafe moments.

Original References

- Linux Kernel Commit 3c6c9e8 (mainline fix)
- Red Hat Bugzilla #1954645
- CVE-2021-46957 at NVD

Takeaways

- Don’t use kprobe on syscalls, especially sys_read, on affected RISC-V Linuxes unless you're patched.

If you run custom kernels for RISC-V, apply this patch or run a version post-5.12.-rc4.

- Remember that while this issue is annoying, it is not remotely exploitable, but could be used as a local DoS by root or anyone with kprobe privileges.


Bottom line: If you run Linux on RISC-V (especially for experimental or cloud environments), take care to patch or upgrade so that tracing and debugging tools don’t crash your whole server.


*All analysis and explanations here are exclusive, simplified, and not copied from kernel commit messages. For those interested, always verify details with the upstream links and your kernel vendor’s advisories.*

Timeline

Published on: 02/27/2024 19:04:06 UTC
Last modified on: 11/01/2024 15:35:02 UTC