CVE-2023-24329 - How Python’s `urllib.parse` Bug Lets Attackers Sneak Past Blocklists

Python is known for its simplicity and wide usefulness, but even Python can have sneaky bugs. One such issue was found in its urllib.parse module—the very foundation used by many libraries to break down and process URLs. Assigned as CVE-2023-24329, this bug allowed attackers, sometimes with just a space, to slip URLs past defenses you thought were bulletproof.

Let’s break down what happened, why it’s a big deal, and how you can protect yourself or your applications.

The Basics: What Did CVE-2023-24329 Affect?

Python’s urllib.parse helps you take apart, fix, and check URLs. It’s used in many popular tools and libraries that check if a URL is pointing somewhere bad. Blocking dangerous domains or schemes (like file://, ftp://, or certain IPs)? Chances are you use urllib.parse in the process.

In Python versions before 3.11, though, this module had an unexpected behavior: If you gave it a URL with spaces or tricky blank characters at the start, it didn’t strip them away. That means urllib.parse.urlparse(' https://evil.com';) saw the whole thing—space and all—as the scheme. Your blocklist might not catch “evil.com” because it didn't match exactly.

Suppose you want to block any redirect to “evil.com”. In Python, you might do

import urllib.parse

def is_url_blocked(url):
    parsed = urllib.parse.urlparse(url)
    # Block if netloc is evil.com
    return parsed.netloc == "evil.com"

url1 = "https://evil.com";
url2 = " https://evil.com";       # Notice the leading space

print(is_url_blocked(url1))  # True, as expected
print(is_url_blocked(url2))  # False, Oops! Should have been blocked

Output

True
False

With the blank space, url2 flies under the radar!

Suppose you’re a Flask developer

from flask import redirect, request

@app.route('/go')
def go():
    url = request.args.get('url', '')
    parsed = urllib.parse.urlparse(url)
    if parsed.netloc == "trusted.com":
        return redirect(url)
    return "Blocked", 400

An attacker sends

/go?url= https://evil.com

Boom! The check doesn’t match “trusted.com”, but neither “evil.com”—so it may sneak through depending on route logic. In some cases, the server might still redirect the user, despite the intention.

Here’s a look into how urlparse works (simplified). Before Python 3.11, no stripping

>>> from urllib.parse import urlparse
>>> urlparse(' https://evil.com';)
ParseResult(scheme='', netloc='', path=' https://evil.com';, ...)

See? The space means the entire string is considered the path, not a URL.

But in some parsing situations, other functions or hand-crafted checks may treat input differently, leading to inconsistencies. Attackers exploit these nuances to confuse logic.

Official References

- CVE-2023-24329 NVD Entry
- Python Security Advisory
- Python 3.11 Changelog
- Commit Fixing The Issue

How Was It Fixed? (Upgrade!)

Python maintainers fixed this by ensuring whitespace and blank characters at the beginning of the URL are stripped before parsing—starting in Python 3.11.3.

Bottom Line:
If you use or deploy code that depends on URL parsing security checks—upgrade Python!

If you must stay with older versions:

Always strip and normalize user input before passing it to urlparse

url = url.strip()
parsed = urllib.parse.urlparse(url)

But the safest approach is just to upgrade to Python 3.11+ and get the fix baked in.

Wrap-up

CVE-2023-24329 is a classic example showing even common standard libraries need careful attention. Small inconsistencies—like how a space is treated—can open big holes attackers love. Make sure your tools are up to date, check your blocklist logic, and scrub user input with care. Don’t let a blank character catch you off guard!

Stay safe, patch fast!

If you’re coding web apps, especially with redirects or URL user input, check your logic today. And as always, follow smart security practices whenever you parse, check, or redirect URLs.

Timeline

Published on: 02/17/2023 15:15:00 UTC
Last modified on: 03/30/2023 04:15:00 UTC