OVERVIEW

A Regular Expression Denial of Service (ReDoS) vulnerability was identified in the huggingface/transformers library (CVE-2024-12720). Specifically, the vulnerability resides in the tokenization_nougat_fast.py file, affecting version 4.46.3 (latest). When specially crafted input is fed into the post_process_single() function, a regular expression used in processing exhibits exponential time complexity under certain conditions, resulting in excessive backtracking. Consequently, this may cause significantly high CPU usage and potential application downtime, thereby creating a Denial of Service (DoS) scenario.

TECHNICAL DETAILS

The problematic regular expression exists in the post_process_single() function, as seen in the following code snippet from the tokenization_nougat_fast.py file:

def post_process_single(self, text: str):
    if self.newlines_behaviour == NewlinesBehaviour.StripAndRebuild:
        # [\x00-\x19]+ matches all control characters
        text = re.sub(r"[\x00-\x19]+", " ", text)
    return text

When input containing specific character patterns is fed into this function, it can lead to catastrophic backtracking, thus elevating CPU usage and forcing application downtime.

IMPACT

The ReDoS vulnerability primarily impacts the availability aspect of the applications that use the affected version of the huggingface/transformers library. In the worst-case scenario, it can lead to application downtime or unresponsiveness. Applications using the tokenization functionality are at a higher risk, especially if they process untrusted user input. Adversaries may exploit this vulnerability to create a Denial of Service (DoS) scenario, either temporarily or permanently disrupting the services provided by the application.

MITIGATION

So far, there isn't an official patch for this issue. However, to mitigate the vulnerability, users can do the following:

REFERENCES

1. Hugging Face Transformers GitHub Repository: https://github.com/huggingface/transformers
2. Tokenization Nougat Fast Source Code: https://github.com/huggingface/transformers/blob/v4.46.3/src/transformers/tokenization_nougat_fast.py
3. CVE-2024-12720: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-12720

In conclusion, software developers and users should remain vigilant about the ReDoS vulnerability (CVE-2024-12720) in huggingface/transformers library's tokenization_nougat_fast.py. Proper input sanitation, rate-limiting mechanisms, and constant monitoring can help mitigate potential risks until an official patch is released.

Timeline

Published on: 03/20/2025 10:15:29 UTC
Last modified on: 03/20/2025 14:15:18 UTC