CVE-2024-42079 - Story of a Linux Kernel NULL Pointer Dereference in GFS2 (`gfs2_log

A newly fixed Linux kernel vulnerability, CVE-2024-42079, highlights another instance where a race condition can cause a kernel panic (or worse) through a simple NULL pointer dereference. This time, it's tucked away in the code behind the GFS2 filesystem, a cluster-aware file system in the Linux kernel.

If you're interested in file system internals, kernel vulnerabilities, or just want a practical example of how tiny mistakes in locking can snowball into security issues, this post is for you.

What is GFS2 and Why Does It Matter?

GFS2 is the "Global File System 2", a shared-disk file system for Linux computer clusters. Since it allows multiple nodes to use the same storage, correctness is crucial. Any error—especially one that lets unprivileged users crash the system—can be catastrophic.

When the Linux kernel tries to dereference a pointer that turns out to be NULL, it crashes with an "Oops". This might not seem like a classic security bug, but attackers can often use such bugs for denial-of-service attacks. In highly-available clusters, this is a serious problem.

Type: NULL pointer dereference (race condition)

- File(s): fs/gfs2/log.c, fs/gfs2/jindex.c

Main functions: gfs2_log_flush(), gfs2_jindex_free()

- Fixed in: Kernel with commit 634dcfd6ba9e

How Did The Bug Happen?

1. During the unmounting (or cleanup) of a GFS2 mount, the function gfs2_jindex_free() sets sdp->sd_jdesc to NULL. This pointer (sd_jdesc) is crucial for logging operations in GFS2.

2. The logging system, specifically gfs2_log_flush(), expects sd_jdesc to be valid. But, due to missing proper locking, it was possible for gfs2_log_flush() to run just as the pointer was being set to NULL.

3. This led to a NULL pointer dereference in gfs2_log_flush(), crashing the kernel (denial of service).

The crux lies in exclusion (having proper concurrency protection so this doesn't happen).

Before the fix, there was no guarantee that sd_jdesc was protected

void gfs2_jindex_free(struct gfs2_sbd *sdp)
{
    // ... other code ...
    sdp->sd_jdesc = NULL; // <-- not protected under proper locking!
    // ... other code ...
}

void gfs2_log_flush(...)
{
    // ... other code ...
    struct gfs2_jdesc *jd = sdp->sd_jdesc;
    do_something(jd); // <-- unsafe if jd is now NULL!
    // ... other code ...
}

If one thread was flushing the log, and another was unmounting, a crash could easily happen.

Patched code (simplified)

void gfs2_jindex_free(struct gfs2_sbd *sdp)
{
    spin_lock(&sdp->sd_log_flush_lock);
    sdp->sd_jdesc = NULL;
    spin_unlock(&sdp->sd_log_flush_lock);
    // ...rest of cleanup...
}

void gfs2_log_flush(struct gfs2_sbd *sdp, ...)
{
    struct gfs2_jdesc *jd;

    spin_lock(&sdp->sd_log_flush_lock);
    jd = sdp->sd_jdesc;
    spin_unlock(&sdp->sd_log_flush_lock);

    if (!jd)
        return; // Nothing to do (avoid crash!)

    // ... safe to use jd now ...
}

You can check out the official commit diff here.

This bug is a classic local denial-of-service candidate

- Who can exploit? Anyone with the ability to mount and unmount a GFS2 file system, or force a quick sequence of operations that will race.
- Impact: Kernel panic/crash. On clusters or production servers, this means possible unplanned downtime.
- How stable is exploitation? Timing is tricky but automatable (e.g., use two threads: one rapidly mounts/unmounts, one rapidly triggers log flushes).

Exploit scenario (pseudo-code)

# WARNING: Running this on a vulnerable kernel will CRASH your system!

# Pseudocode, do not actually run this on production
while True:
    mount_gfs2_fs()
    unmount_gfs2_fs()

Meanwhile, in another thread, hammer a file on that filesystem to trigger logging.

Note: This is not a privilege escalation vector as far as is known, but denial-of-service is extremely valuable for anyone wanting to disrupt services.

References & Further Reading

- Kernel commit diff: git.kernel.org, commit 634dcfd6ba9e
- CVE page: CVE-2024-42079 at MITRE *(May take time to update with details)*
- GFS2 Documentation: kernel.org GFS2 docs

Summary

CVE-2024-42079 reminds us that even in mature code, small mistakes in concurrent programming can have big consequences. The GFS2 developers responded quickly by introducing proper exclusion and checks for NULL, making your clusters (and the kernel in general) a little bit safer.

If you manage or develop for systems using GFS2, update your kernel as soon as possible!

> *Found this useful? Share with fellow sysadmins and kernel enthusiasts to keep clusters safe!*

Timeline

Published on: 07/29/2024 16:15:07 UTC
Last modified on: 08/02/2024 04:54:31 UTC