TensorFlow is one of the most widely used open-source machine learning libraries. With millions of downloads and a broad user base, security in TensorFlow is a big deal. In late 2022, researchers discovered a critical vulnerability tracked as CVE-2022-41890, revealing how certain functions could crash when passed very large input values. In this article, we’ll break down what went wrong, how you can trigger the bug (with code samples), and what you should do to keep your machine learning workloads safe.
What is CVE-2022-41890?
Simply put, this is a bug where TensorFlow’s internal shape handling code (BCast::ToShape) fails when “broadcasting” extremely large shapes. Even though the function is supposed to support 64-bit (int64) values, it blows up (crashes) if you go past the limits of a standard 32-bit integer (int32). This can be exploited by attackers to cause a denial of service (DoS) on TensorFlow-powered servers or applications.
One real-world function affected is tf.experimental.numpy.outer, which is used to calculate the outer product. By passing specially crafted large arrays to b, you can crash the TensorFlow process.
Here’s the official GitHub fix commit.
Let’s dive into how an attacker could trigger this in Python with TensorFlow
import tensorflow as tf
import numpy as np
# Create a very large array to overflow int32
large_size = np.iinfo(np.int32).max + 1 # Just over 32-bit integer limit
# 'a' can be small, 'b' is huge, triggers the bug
a = np.array([1])
try:
b = np.zeros(large_size, dtype=np.int8) # Will likely crash or throw MemoryError
# The call below may crash TensorFlow internally if not patched
result = tf.experimental.numpy.outer(a, b)
except Exception as e:
print("Caught exception:", e)
Be aware: actually allocating such a large array will probably exhaust memory and may not be allowed by your operating system, so this is for educational/illustrative purposes. In practice, attackers may find tricks to trigger the bug without requiring an actual allocation.
TensorFlow 2.8.4 (fix cherrypicked)
Older actively supported releases in the 2.8.x, 2.9.x, and 2.10.x branches are also vulnerable.
The Fix
The official fix landed in this GitHub commit:
- What changed: The shape-broadcasting logic now properly checks for 32-bit overflows and safely supports up to 64-bit (int64) inputs.
- Where: Updated in the BCast::ToShape() function and related operations that depend on shape size.
Here’s an illustrative snippet adapted from the patch, showing additional validation logic
// Before:
for (int i = ; i < shape.size(); ++i) {
result.push_back(static_cast<int64>(shape[i])); // Might overflow!
}
// After:
for (int i = ; i < shape.size(); ++i) {
int64 val = static_cast<int64>(shape[i]);
if (val > std::numeric_limits<int32>::max())
return errors::InvalidArgument("Shape value too large");
result.push_back(val);
}
(See TensorFlow commit 831bf8 for the real deal.)
Mitigation & Recommendations
- Upgrade TensorFlow: Move to 2.11 or the latest 2.10.x/2.9.x/2.8.x patch (at least 2.10.1, 2.9.3, 2.8.4).
- Restrict Inputs: If you allow users to submit data to your TensorFlow models via APIs, validate array sizes before passing them to TensorFlow functions.
- Monitor for Crashes: Watch your logs for sudden process exits—if you see them after someone submits suspicious data, you could be under attack.
Original References
- TensorFlow GitHub Security Advisory
- Patch commit on GitHub
- CVE-2022-41890 on NIST NVD
Conclusion
CVE-2022-41890 serves as a reminder that even mature libraries like TensorFlow can have pitfalls when handling edge cases in data sizing. Make sure to keep your ML infrastructure updated, and always be mindful of the input data you accept—especially in an internet-facing context.
Stay safe, and keep your models running!
*Written exclusively for this reader. If you want more deep dives into machine learning security, let us know below!*
Timeline
Published on: 11/18/2022 22:15:00 UTC
Last modified on: 11/22/2022 21:30:00 UTC