Envoy is a widely used, high-performance proxy that’s become a core building block in modern cloud-native architectures. Its features, including advanced routing and upstream health checking, make it extremely popular—but like all software, it's not immune to vulnerabilities. In this exclusive deep dive, we'll explore CVE-2022-29224: a critical bug that allows remote attackers to crash Envoy by exploiting a segmentation fault in gRPC health checking.

We’ll break down what this vulnerability is, why it matters, walk through a simplified example of the vulnerable code path, describe potential exploits, and share mitigation strategies—all in plain, accessible language.

What is CVE-2022-29224?

CVE-2022-29224 official advisory discloses a segmentation fault that can crash Envoy containers or processes when certain health checking features are misused by an attacker.

Vulnerable component: GrpcHealthCheckerImpl

- Attack surface: If an attacker controls both (a) an upstream host and (b) service discovery for that host (such as via DNS, the EDS API, etc.), they can cause Envoy to crash
- Root cause: Null pointer dereference when an upstream host is removed then fails a gRPC health check

How Does Envoy Health Checking Work?

Envoy can monitor (health check) the backends it sends traffic to, marking them healthy or unhealthy and removing failed hosts. It supports health checks over HTTP, TCP, and gRPC. The gRPC checker works by repeatedly calling the grpc.health.v1.Health/Check RPC on each upstream host.

Envoy's "hold" feature allows you to keep certain upstream hosts, discovered through service discovery (for example, via DNS or API), in place until there's positive evidence (a failed health check) that they're dead.

How the Bug Happens

The bug occurs when a host, tracked through service discovery, is removed, but the Envoy process still has a gRPC health-check in-flight (or about to fire) against it. If the health check comes back after the host was logically "removed," Envoy's code assumes pointers to the host are still valid. Dereferencing a null pointer then causes a crash.

Exploitation scenario

1. An attacker controls a backend host (e.g., a service or pod) and also controls how it's added/removed from service discovery (via DNS or EDS config).
2. The attacker causes the host to be removed from discovery, but before Envoy removes it from gRPC health-checking, the attacker makes the gRPC service return a failed health check.
3. Envoy's GrpcHealthCheckerImpl tries to process the failed result against a now-invalid pointer, causing a segmentation fault and process crash.

Code Snippet: Null Pointer Dereference

Here’s a simplified pseudocode snippet based on the Envoy health checker code that helps visualize the flaw:

void GrpcHealthCheckerImpl::onCheckComplete(Host* host, HealthCheckResult result) {
    if (host == nullptr) {
        // This situation can happen if the host is destroyed
        // due to removal from service discovery.
        // CRASH! Null pointer dereference below.
        log("Host is null!");
        // next line causes segmentation fault
        if (host->isHealthy()) { ... }
    }
    // ...existing logic...
}

If host is destroyed during removal, the pointer becomes nullptr. The next operation, accessing host->isHealthy(), causes the process to dereference a null pointer, crashing the Envoy process.

Example Exploit Walkthrough

Suppose you run a Kubernetes cluster with Envoy as a sidecar/proxy, and service discovery is managed by DNS.

- You (the attacker) control a pod and orchestrate its appearance/disappearance from DNS.
- Via DNS or the EDS API, you have the pod removed from Envoy’s service discovery just after a gRPC health check is triggered.

Simultaneously, you make your gRPC service respond with a check status of NOT_SERVING (failed).

- Envoy tries to handle the failed health check, but the code path inside GrpcHealthCheckerImpl tries to reference your host’s state, which is now a null pointer, crashing Envoy.

In real-world terms: with enough permissions, an attacker can take down any clusters using vulnerable Envoy builds in this way—no authentication required.

If you use Envoy

- Upgrade immediately to version 1.22.1 or later. Release notes here.

Disable gRPC health checking in your configuration, or

- Switch to HTTP/TCP health checking if feasible, which aren't vulnerable

Practical configuration fix:

Comment out or remove sections like

health_checks:
  - timeout: 1s
    interval: 10s
    grpc_health_check:
      service_name: "myservice"

Replace with something like

health_checks:
  - timeout: 1s
    interval: 10s
    http_health_check:
      path: "/health"

Or disable altogether if your architecture allows.

Extra Resources and References

- Official CVE record (CVE-2022-29224)
- Envoy advisory: GHSA-3665-6f3c-f8wm
- Envoy v1.22.1 Release Notes
- Envoy GrpcHealthCheckerImpl Code (GitHub)

Conclusion

CVE-2022-29224 is a classic example of how subtle pointer-bookkeeping bugs can create catastrophic outcomes in complex, multi-threaded cloud-native systems. If you depend on Envoy with gRPC health checks, upgrade at once. If you can’t, disable or change your health checking for now.

Stay patched, audit your service discovery, and don’t let an attacker crash your mesh!

Have questions about Envoy or cloud security? Drop them below!

Timeline

Published on: 06/09/2022 19:15:00 UTC
Last modified on: 06/16/2022 17:46:00 UTC