CVE-2025-0938 - Python’s urllib.parse Flaw with Square Brackets in Domain Names (Exclusive Post)

Date: June 2024
Status: Public
Affected Python versions: Python 3 (before 3.12.4 & 3.11.9)
CWE: CWE-20 (Improper Input Validation)

Summary

A newly disclosed security vulnerability, CVE-2025-0938, affects the widely used Python standard library—specifically the urllib.parse module. This issue involves the urlsplit and urlparse functions, which mistakenly accept domain names (hostnames) containing square brackets, even though such domain names are invalid by internet standards (RFC 3986). This oversight can cause differential URL parsing in Python versus other, more standards-compliant tools, which may have security consequences especially in authentication, access control, or URL-based security checks.

Why Does This Matter?

According to RFC 3986, square brackets [ and ] are only allowed as delimiters for IPv6 and IPvFuture addresses in the "host" portion of a URI, and never in normal domain names.

Bad example (should be invalid):

https://ex[ample].com/path

Valid IPv6 example:

https://[2001:db8::1]:808/path

Python's URL parser functions (urllib.parse.urlsplit and urllib.parse.urlparse) did not check for this, and would happily accept things like https://ex[ample].com/. Other languages/tools (like Browsers, Node.js, or Go's standard library) correctly reject such URLs. This mismatch could expose security holes in software that trusts Python’s parsing outcome.

If you rely on Python’s parsing output for things like

- URL filtering (blacklist/whitelist by domain)

Redirect validations

- Any operation that passes URLs between different systems/languages
...then your program may treat strangely-formed URLs as safe, when in fact *other parts of your system (or peer systems) might not*.

This opens the door for attacks that bypass filters, gain unauthorized access, or otherwise break trust boundaries.

Python before the fix

from urllib.parse import urlparse

url = "https://ex[ample].com/path"
parsed = urlparse(url)

print("Scheme:", parsed.scheme)   # 'https'
print("Netloc:", parsed.netloc)   # 'ex[ample].com'
print("Hostname:", parsed.hostname) # 'ex[ample].com' (no error!)

Expected according to RFC 3986:
That domain is illegal and should be rejected.

2. How Other Parsers React

Node.js:

try {
  new URL("https://ex[ample].com/";);
} catch (e) {
  console.log(e.message); // Invalid URL
}

Go:

u, err := url.Parse("https://ex[ample].com/";)
fmt.Println(err) // "invalid character '[' in host name"

Result: Modern web technologies block it, but Python lets it slip!

Imagine an app with a whitelist like

allowed_domains = {"example.com"}

def is_allowed(url):
    domain = urlparse(url).hostname
    return domain in allowed_domains

print(is_allowed('https://example.com/';))            # True
print(is_allowed('https://ex[ample].com/';))          # False? Actually, also False, but...

What if the check is more naive?

Suppose you use string comparison, and additional logic fails to sanitize input. An attacker could send ex[ample].com, which Python keeps, but browser or proxy sends to example.com. This can cause mismatch between your backend's view and frontend's.

Reference Links

- CVE-2025-0938 (NVD entry)
- Python security advisory *(placeholder: add actual link)*
- RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax (Section 3.2.2)
- Python urllib.parse docs

The Fix

The patch changes urllib.parse so that square brackets are only allowed in the host field if it’s a (potential) IPv6 address. Given this fix, trying to parse an illegal domain will now raise an exception (or return None as hostname, depending on the call).

Example with fixed Python

from urllib.parse import urlparse

try:
    u = urlparse('https://ex[ample].com/';)
    assert u.hostname is None, "Invalid host should yield None"
except ValueError:
    print("Caught malformed URL!")

`

- Review Filters: Validate *all* user or third-party input for valid host rules per RFC 3986 before trusting or using the values in security-related logic.

Conclusion

Python’s urllib.parse was more forgiving than the standards allow. In security, differences in interpretation like this can be the weak link. Always treat input validation as a defense-in-depth concern, and stay updated with your dependencies!

*For more details, see the official Python and CVE advisories, and patch your code now.*

Exclusive Analysis by [YourName]

(Feel free to share. Check original references for up-to-date info!)

Timeline

Published on: 01/31/2025 18:15:38 UTC
Last modified on: 11/03/2025 21:18:49 UTC