Cluster security is a critical part of production infrastructure, and a newly disclosed vulnerability—CVE-2023-44981—puts Apache ZooKeeper deployments at severe risk. This vulnerability allows a sneaky attacker to bypass quorum peer authorization and join a ZooKeeper cluster, potentially hijacking the system, reading all data, or even making unauthorized changes.

In this post, I’ll explain the vulnerability in simple terms, show what went wrong with code snippets, and provide advice on how to protect your ZooKeeper deployment.

What is Apache ZooKeeper?

Apache ZooKeeper is a widely-used open-source service for maintaining configuration information, naming, providing distributed synchronization, and providing group services in large distributed systems.

The Vulnerability: Authorization Bypass Through User-Controlled Key

CVE-2023-44981 affects Apache ZooKeeper when using SASL Quorum Peer authentication (quorum.auth.enableSasl=true). Normally, ZooKeeper checks that only authorized servers can join the cluster using SASL (Simple Authentication and Security Layer). But due to a logic bug, servers without a specific "instance" in their authentication ID are *not* properly checked.

In ZooKeeper, the authentication ID should look like this

zookeeper/zk1@EXAMPLE.COM

But the "instance" (e.g., zk1) is optional. If an attacker uses an ID like

eve@EXAMPLE.COM

ZooKeeper *skips* the check to verify that "eve" is actually a listed server in the configuration file (zoo.cfg). As a result, any host can claim to be a valid cluster member.

Here's what the authorization code tried to do (simplified)

// This is simplified pseudocode
String authorizedId = getSaslAuthId(); // e.g., "zookeeper/zk1@EXAMPLE.COM" or "eve@EXAMPLE.COM"
String[] parts = authorizedId.split("/");

if (parts.length > 1) {
    // Check instance part (e.g., 'zk1') against zoo.cfg server list
    if (isInServerList(parts[1])) {
        allowJoin();
    } else {
        denyJoin();
    }
} else {
    // No instance part supplied -> OOPS, skips authorization check!
    allowJoin();
}

So, supplying an ID like eve@EXAMPLE.COM never even triggers the server list check. That means *anyone* can join!

Cluster Compromise: Attackers could join your cluster as peers.

- Counterfeit Changes: These rogue peers can send bogus data changes to the leader/master node.

Exploit Example

Let's assume you control a machine on the network and have Kerberos credentials for EXAMPLE.COM. You could run:

# As a malicious peer with Kerberos credentials
JVM_OPTS="-Djava.security.auth.login.config=jaas.conf"
zkServer.sh start-foreground # Using peer config, not client

And in your jaas.conf

Server {
    com.sun.security.auth.module.Krb5LoginModule required
    useKeyTab=true
    keyTab="/etc/security/keytabs/eve.keytab"
    principal="eve@EXAMPLE.COM";
};

ZooKeeper will let you peer, since the missing zookeeper/INSTANCE part means no server-list check is performed.

Clusters NOT using additional network protections, like firewalls between hosts.

> The vulnerability DOES NOT affect default ZooKeeper setups, as Quorum Peer SASL auth is not enabled by default.

3.7.x users: Upgrade to 3.7.2 or later

The bug is fixed in these versions. You can find the fix described in the Apache ZooKeeper Security Advisory and the project JIRA ticket.

If you *cannot upgrade immediately*, do the following

1. Network-level controls: Protect ensemble communication (TCP ports 2888/3888) with strict firewalls—only allow known cluster members.
2. Monitor for rogue peers: Monitor the cluster logs and membership for unexpected entries or connections.

More Information

- Official CVE Record for CVE-2023-44981
- ZooKeeper Security Docs
- ZOOKEEPER-4842 JIRA - Fix SASL quorumpeer required instance requirement

Conclusion

CVE-2023-44981 is a serious yet easily fixed bug for users of Apache ZooKeeper. If you rely on SASL Quorum Peer authentication, you must upgrade or firewall your cluster nodes immediately to avoid a catastrophic breach. Always check your configs and keep security patches up to date—this case shows how even optional pieces of authentication logic can have big consequences.

Timeline

Published on: 10/11/2023 12:15:11 UTC
Last modified on: 11/01/2023 07:15:09 UTC