Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (null pointer read and/or write) when reading CDX files #23

Open
the-blank-x opened this issue Jan 13, 2024 · 1 comment · May be fixed by #24
Open

Segmentation fault (null pointer read and/or write) when reading CDX files #23

the-blank-x opened this issue Jan 13, 2024 · 1 comment · May be fixed by #24
Assignees
Labels
bug Something isn't working

Comments

@the-blank-x
Copy link

Steps to reproduce:

  1. Save the following to crashpoc.cdx:
 CDX a b a m s k r M V g u
http://tillystranstuesdays.com/ 20240113012144 http://tillystranstuesdays.com/ text/html 200 AFQB6VVCWSKWEIAEJADJZAFMXOEGHO57 - - 1358 tillystranstuesdays.warc.gz <urn:uuid:4489ae0e-2e7d-482d-bff6-e86b02a3d719>
  1. Run wget-at --warc-dedup=crashpoc.cdx --warc-file=test https://example.com

Expected behavior: wget-at to download https://example.com into test.warc.gz

Actual behavior:

> ~/gits/wget-lua/src/wget --warc-dedup=crashpoc.cdx --warc-file=test https://example.com
zsh: segmentation fault (core dumped)  ~/gits/wget-lua/src/wget --warc-dedup=crashpoc.cdx --warc-file=test

Additional information:

> ~/gits/wget-lua/src/wget -V
GNU Wget 1.21.3-at.20231215.01 built on linux-gnu.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls 
+ntlm +opie +psl +ssl/gnutls 

Wgetrc: 
    /usr/local/etc/wgetrc (system)
Locale: 
    /usr/local/share/locale 
Compile: 
    gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/usr/local/etc/wgetrc" 
    -DLOCALEDIR="/usr/local/share/locale" -I. -I../lib -I../lib 
    -I/usr/include/luajit-2.1 -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS 
    -DNDEBUG -ggdb -O0 
Link: 
    gcc -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG -ggdb -O0 
    -lpcre2-8 -luuid -lidn2 -lnettle -lgnutls -lzstd -lz -lpsl -lm -ldl 
    -lluajit-5.1 ../lib/libgnu.a -lunistring 

Backtrace from GDB:

#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:76
#1  0x00005555555c0c1a in xstrdup (string=0x0) at xmalloc.c:338
#2  0x00005555555a270b in store_warc_record (uri=0x5555556126b0 "http://tillystranstuesdays.com/", date=0x0, uuid=0x555555612710 "<urn:uuid:4489ae0e-2e7d-482d-bff6-e86b02a3d719>", 
    digest=0x7fffffffe4b0 "\001`\037V\242\264\225b \004H\006\234\200\254\273\210c\273\277\377\177") at warc.c:1415
#3  0x00005555555a2a7c in warc_process_cdx_line (lineptr=0x5555556125b0 "http://tillystranstuesdays.com/", field_num_original_url=0x2, field_num_checksum=0x5, field_num_record_id=0xa) at warc.c:1520
#4  0x00005555555a2c9e in warc_load_cdx_dedup_file () at warc.c:1591
#5  0x00005555555a2e70 in warc_init () at warc.c:1658
#6  0x0000555555591d4d in main (argc=0x4, argv=0x7fffffffe488) at main.c:2088

store_warc_record is called with a null pointer as its second parameter:

store_warc_record(original_url, NULL, record_id, digest);

store_warc_record doesn't check against null pointers, hence a segfault:

wget-lua/src/warc.c

Lines 1405 to 1422 in c1fe609

/* Store the WARC record in the warc_dedup_table using a warc_dedup_key and a
warc_dedup_record for the data. Copies all variables to the hash table. */
static void
store_warc_record (const char *uri, const char *date, const char *uuid,
const char *digest)
{
struct warc_dedup_record *rec = xmalloc (sizeof (struct warc_dedup_record));
struct warc_dedup_key *key = xmalloc (sizeof (struct warc_dedup_key));
rec->uri = xstrdup (uri);
rec->date = xstrdup (date);
rec->uuid = xstrdup (uuid);
key->uri = xstrdup (uri);
memcpy (rec->digest, digest, SHA1_DIGEST_SIZE);
memcpy (key->digest, digest, SHA1_DIGEST_SIZE);
hash_table_put (warc_dedup_table, key, rec);
}

When I was initially diagnosing this issue, I got a segfault from another area:

wget-lua/src/warc.c

Lines 1511 to 1519 in c1fe609

char *digest;
base32_decode_alloc (checksum, strlen (checksum), &checksum_v,
&checksum_l);
xfree (checksum);
if (checksum_v != NULL && checksum_l == SHA1_DIGEST_SIZE)
{
/* This is a valid line with a valid checksum. */
memcpy (digest, checksum_v, SHA1_DIGEST_SIZE);

digest is uninitialised when it is written to, causing a segfault and/or potential memory corruption. (In my case, digest was 0x0, but recompiling with -ggdb -O0 made it become some random writable pointer)

@the-blank-x the-blank-x changed the title Segmentation fault (null pointer read and/or write) under certain conditions Segmentation fault (null pointer read and/or write) when reading CDX files Jan 13, 2024
@Arkiver2 Arkiver2 added the bug Something isn't working label Jan 17, 2024
@Arkiver2 Arkiver2 self-assigned this Jan 17, 2024
@Arkiver2
Copy link
Member

Thank you for looking into this and the detailed report! I'll push a fix today or tomorrow for this and do a check if there are other cases like this.

@the-blank-x the-blank-x linked a pull request Jan 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants