In early 2021, a vulnerability identified as CVE-2021-46968 was found and addressed in the Linux kernel’s cryptographic stack on IBM’s s390 architecture. The issue caused a memory leak during the hot-unplug process of crypto adapters, which could slowly degrade kernel memory and cause system stability issues. Let’s break down what happened, look at the details of the bug, and see the code behind the fix.

What’s the Vulnerability?

The s390 platform (IBM’s mainframe architecture) uses a subsystem called zcrypt to manage cryptographic cards ("zcards") and cryptographic queues ("zqueues"). These can be added or removed ("hot-plugged" or "hot-unplugged") while the system is running.

With CVE-2021-46968, when a card or queue was hot-unplugged, the related data structures (zcard, zqueue) in the kernel weren’t freed from memory. The cause: a mismatch in handling the reference count (kref), a simple mechanism that lets the kernel know when it’s safe to delete an object.

The Technical Root: Reference Counter Mishap

The reference counting is managed by the kref structure. When a zcard or zqueue is created, its kref is initialized to 1. Every time some code takes a new reference (meaning, it’s using the object), it calls kref_get(). When it's done, it calls kref_put(). The last kref_put() (when the counter hits ) actually frees the object.

The bug:
The code failed to do the final kref_put() when *unregistering* (removing) the card or queue. So the reference counter never dropped to zero. As a result, neither zcard nor zqueue structures were freed, causing a slow memory leak—especially nasty on long-lived systems or those with lots of hot-plug activity.

Reproducing the Bug

People running KVM (Kernel-based Virtual Machine) with a kernel compiled with memory leak debugging (like kmemdebug) saw this when repeatedly hot-unplugging crypto adapters from the virtual machine.

- Eventually, OOM (Out-Of-Memory) issues or performance degradation on s390 virtual or physical hardware.

The Patch: Making the Final Put

Here’s the key code fix that went to the upstream Linux kernel (commit reference):

// ... upon zcard or zqueue unregister
static void zcrypt_zcard_unregister(struct zcrypt_card *zcard)
{
    /* This kref_init() in object creation gives us 1 */
    // ... other cleanup code

    /* Fix: Drop the initial reference to trigger object release */
    kref_put(&zcard->refcount, zcrypt_card_release);
}

static void zcrypt_zqueue_unregister(struct zcrypt_queue *zqueue)
{
    /* Drop initial reference */
    kref_put(&zqueue->refcount, zcrypt_queue_release);
}

The missing calls to kref_put() in the unregister routines guarantee that the object is really freed when it’s no longer needed.

Why it Matters?

Without this, each zcard and zqueue ever unplugged in the life of a server would eat memory and never go away. Over time, that’s a recipe for disaster.

Distribution security advisories:

- SUSE security announcement
- Red Hat Bugzilla

Upstream commits:

- Patch on kernel.org

kref API documentation:

Kernel Documentation (kref)

Exploitability: Is This Weaponizable?

This vulnerability doesn’t allow traditional “exploitation” (like privilege escalation or code execution). Instead, it’s about causing system instability or Denial of Service (DoS) by intentionally chewing up memory:

1. Scripted hot-unplug/hot-plug of crypto adapters—like on a test machine or a virtual instance.

Eventually, system slows down or crashes due to out of memory.

Practical risk:
Mostly to cloud or mainframe operators using s390 in environments with dynamic hardware (virtual or physical). If an attacker can control device hot-plugging, they could cause a slow-burn DoS.

Monitor hot-plug activity: Especially on s390 platforms.

3. Monitor kernel memory: Use tools like kmemleak, top, or custom monitoring for unusual kernel memory growth.

Summary

CVE-2021-46968 is a classic example of how missing a single reference decrement can have real-world effects on system health—even if it doesn’t directly let an attacker gain control. It highlights how error-prone manual memory and object lifetime management can be, and why test tools like kmemdebug remain critical for kernel development.


*If you rely on s390 and hot-plug cryptography hardware, make sure your kernel is up to date. For any Linux admin, this issue is a reminder to keep an eye on the quiet but dangerous bugs lurking deep in the system.*

Timeline

Published on: 02/27/2024 19:04:07 UTC
Last modified on: 01/08/2025 16:50:33 UTC