The org.cyberneko.html library is an HTML parser written in Java. It's intended to help programmers parse and navigate the DOM tree of HTML/XML documents. While the main org.cyberneko.html library is no longer maintained, a fork of the library is used by the popular Rubygem Nokogiri. Nokogiri is a dynamic, flexible and easy-to-use HTML/XML parser, which makes it an essential tool for Ruby developers working with HTML markup.

Vulnerability: CVE-2022-24839

A security vulnerability has recently been identified in this fork, assigned the CVE identifier CVE-2022-24839. Specifically, it was found that this fork of org.cyberneko.html raises a java.lang.OutOfMemoryError exception when parsing ill-formed HTML markup. This can cause an application using Nokogiri to terminate unexpectedly and leave the system in an unstable state.

Exploit Details

Suppose an attacker can control the input to a Nokogiri-based parser, such as in a web scraping or data extraction application. By crafting ill-formed HTML markup, the attacker could exploit the vulnerability to force the app to run out of memory, causing a denial-of-service (DoS) condition and effectively crashing the application.

Here's an example of a code snippet that might trigger the vulnerability

require 'nokogiri'

# This is an example of an ill-formed HTML markup that triggers the vulnerability
ill_formed_html = <<-EOHTML
<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>CVE-2022-24839 Exploit</title>
</head>
<body>
  <div>
    <!-- Missing closing tag for the div element -->
    <!-- and other malformed markup here -->
  </div>
</body>
</html>
EOHTML

# Parsing the ill-formed HTML with Nokogiri
doc = Nokogiri::HTML(ill_formed_html)

Affected Versions

This vulnerability affects the Nokogiri versions using the specific fork of org.cyberneko.html, and users are advised to upgrade to version >= 1.9.22.noko2 to mitigate the risk.

Original References

- CVE-2022-24839
- Nokogiri Project
- Nokogiri on RubyGems.org
- Upstream org.cyberneko.html (no longer maintained)
- Fork of org.cyberneko.html used by Nokogiri (sparklemotion/nekohtml)

Conclusion

The CVE-2022-24839 vulnerability affects a Java-based fork of the org.cyberneko.html library used by the Nokogiri Rubygem. It can be exploited through ill-formed HTML, which can lead to java.lang.OutOfMemoryError exceptions and application instability. Users of Nokogiri are urged to upgrade to version >= 1.9.22.noko2 to resolve the issue. It is essential to remember that other forks of nekohtml may have similar vulnerabilities, and developers should investigate whether their specific implementation is affected.

Timeline

Published on: 04/11/2022 22:15:00 UTC
Last modified on: 07/25/2022 18:22:00 UTC