Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSS-Fuzz targets improvements #267

Merged
merged 12 commits into from
May 8, 2024
Merged

Conversation

renatahodovan
Copy link
Contributor

I was experimenting with quickjs fuzzing through libFuzzer, when I recognized some inefficiencies. I've uploaded a patch set to OSS-Fuzz to fix them, where I was advised to ask a review from the quickjs maintainers and possibly move the implementation of the fuzz targets into the main codebase (here). So, I'm doing it in this PR.

The patches should be straightforward: the first one is a copy of the current version of the fuzz targets from oss-fuzz, while the others incrementally apply independent fixes/improvements.

If and when you find these commits acceptable, then I'll adapt the infra in oss-fuzz to use the new code. Plus, if requested, I can enable to mirror the found bugs to the GitHub issue tracker as well.

@saghul
Copy link
Contributor

saghul commented Apr 8, 2024

WAT

@renatahodovan
Copy link
Contributor Author

May I ask for a review from someone?

@renatahodovan
Copy link
Contributor Author

Gentle ping

@saghul
Copy link
Contributor

saghul commented Apr 22, 2024

@chqrlie Do you want these here or in NG? I might be able to help with either.

@renatahodovan
Copy link
Contributor Author

@saghul Since only this original repository is registered into oss-fuzz (allowing to publicly keep track of the performance improvement), my first priority is to merge these changes here. But, since libFuzzer can be executed locally too, NG could also gain profit from these changes, so I'm not against merging there, too.

Copy link
Contributor

@saghul saghul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, please take a look! How is this supposed to work? Those are 3 files, none of them use the dictionary, and there are no instructions on how one can run the fuzzer. IMHO a README in the fuzz directory is necessary here.

@@ -0,0 +1,257 @@
"__loadScript"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this dict created? If automated, is it possible to check-in the script that does so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names were collected with a JS script, I'll upload it in the next round.

JS_SetInterruptHandler(JS_GetRuntime(ctx), interrupt_handler, NULL);
js_std_add_helpers(ctx, 0, NULL);

uint8_t *NullTerminatedData = (uint8_t *)malloc(Size + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary cast

memcpy(NullTerminatedData, Data, Size);
NullTerminatedData[Size] = 0;
JSValue obj;
obj = JS_Eval(ctx, (const char *)NullTerminatedData, Size, "<none>", JS_EVAL_FLAG_COMPILE_ONLY | JS_EVAL_TYPE_GLOBAL | JS_EVAL_TYPE_MODULE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing both JS_EVAL_TYPE_GLOBAL and JS_EVAL_TYPE_MODULE doesn't make sense. It's either a global eval or a module eval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a legacy code from oss-fuzz, but I think the current version is equivalent with JS_EVAL_FLAG_COMPILE_ONLY | JS_EVAL_TYPE_MODULE, so I'll update it.

obj = JS_Eval(ctx, (const char *)NullTerminatedData, Size, "<none>", JS_EVAL_FLAG_COMPILE_ONLY | JS_EVAL_TYPE_GLOBAL | JS_EVAL_TYPE_MODULE);
free(NullTerminatedData);
//TODO target with JS_ParseJSON
if (JS_IsException(obj)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is awaits in there the returned object will be a promise, you'll want to js_std_await it.

free(NullTerminatedData);
//TODO target with JS_ParseJSON
if (JS_IsException(obj)) {
JS_FreeContext(ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS_FreeValue(ctx, obj); is missing here, for completeness.


#define CAPTURE_COUNT_MAX 255

FILE *outfile=NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go inside the function.

#include <stdint.h>
#include <stdio.h>

#define CAPTURE_COUNT_MAX 255
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in a single place and multiplied by 2. Why not just write the number literal?

return 0;
}
bc = lre_compile(&len, error_msg, sizeof(error_msg), (const char *) Data,
Size1, 0, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the context as the opaque accomplishes nothing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be passed as opaque then? (Using NULL instead of ctx causes a segfault.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, the main test in libregexp.c even passes it as NULL. What is the backtrace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--- a/fuzz/fuzz_regexp.c
+++ b/fuzz/fuzz_regexp.c
@@ -56,7 +56,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
         return 0;
     }
     bc = lre_compile(&len, error_msg, sizeof(error_msg), (const char *) data,
-                     size1, 0, ctx);
+                     size1, 0, NULL/*ctx*/);
     if (!bc) {
         return 0;
     }
==1848896==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000018 (pc 0x555555672e50 bp 0x7fffffffc3c0 sp 0x7fffffffc380 T1848896)
==1848896==The signal is caused by a READ memory access.
==1848896==Hint: address points to the zero page.
    #0 0x555555672e50 in lre_realloc quickjs/quickjs.c:43870:31
    #1 0x55555577e5c5 in dbuf_realloc quickjs/cutils.c:114:19
    #2 0x55555577e861 in dbuf_put quickjs/cutils.c:140:13
    #3 0x55555577eb39 in dbuf_putc quickjs/cutils.c:161:12
    #4 0x55555576ad83 in lre_compile quickjs/libregexp.c:1771:5
    #5 0x55555562ec7d in LLVMFuzzerTestOneInput quickjs/fuzz/fuzz_regexp.c:58:10
    #6 0x5555555ecc30 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (quickjs/fuzz_regexp+0x98c30) (BuildId: 383173e4c29297490e72ea7182a0eff6c735fb39)
    #7 0x5555555ec3a5 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) (quickjs/fuzz_regexp+0x983a5) (BuildId: 383173e4c29297490e72ea7182a0eff6c735fb39)
    #8 0x5555555edb85 in fuzzer::Fuzzer::MutateAndTestOne() (quickjs/fuzz_regexp+0x99b85) (BuildId: 383173e4c29297490e72ea7182a0eff6c735fb39)
    #9 0x5555555ee795 in fuzzer::Fuzzer::Loop(std::vector<fuzzer::SizedFile, std::allocator<fuzzer::SizedFile>>&) (quickjs/fuzz_regexp+0x9a795) (BuildId: 383173e4c29297490e72ea7182a0eff6c735fb39)
    #10 0x5555555dc8fb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (quickjs/fuzz_regexp+0x888fb) (BuildId: 383173e4c29297490e72ea7182a0eff6c735fb39)
    #11 0x555555605932 in main (quickjs/fuzz_regexp+0xb1932) (BuildId: 383173e4c29297490e72ea7182a0eff6c735fb39)
    #12 0x7ffff7a52d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #13 0x7ffff7a52e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #14 0x5555555d1d64 in _start (quickjs/fuzz_regexp+0x7dd64) (BuildId: 383173e4c29297490e72ea7182a0eff6c735fb39)

UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV quickjs/quickjs.c:43870:31 in lre_realloc
==1848896==ABORTING

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. quickjs.c provides a lre_realloc implementation which uses the context. Since the text uses just libregexp, it should compile in quickjs.c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to provide these 2 in the fuzzer script:

/* must be provided by the user */
LRE_BOOL lre_check_stack_overflow(void *opaque, size_t alloca_size);
void *lre_realloc(void *opaque, void *ptr, size_t size);

And avoid compiling quickjs.c for that particular test.

return 0;
}
input = Data+Size1+1;
ret = lre_exec(capture, bc, input, 0, Size-(Size1+1), 0, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about ctx.

JSContext *ctx;

int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
if (outfile == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way for outfile to not be null, what's the deal here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I only used and improved the eval and compile targets, while the regexp target was simply copied from oss-fuzz. But, I think the purpose of outfile is similar as of the initialized variable in the other two targets. If I simplify libFuzzer, then it's nothing more than a fuzzing harness with an infinite loop which calls the LLVMFuzzerTestOneInput function linked to it until it crashes. Since in a fuzzing session everything is about performance, the one-time initializations are preferred to be executed only once. So, I think the NULL-check of outfile also serves this purpose: after the first test run, outfile won't be NULL (most probably), hence the initialization won't run again either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, I don't see the purpose of writing the result of lre_exec into this outfile, since it's never read back and file IO (especially unnecessary IO) and fuzzing are not the best friends (performance reasons). Could it be reasoned with some compiler hack to avoid optimizing out lre_* calls if their result is not used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea TBH.

The use of JS_NewContext instead of JS_NewContextRaw spares to call
JS_AddIntrinsic<XYZ> functions from the fuzz target, since the public
JS_NewContext API does exactly the same.
fuzz_regexp doesn't need to be dependant on libquickjs since the
runtime and the context - that were provided by libquickjs - were
only created to call two simple functions implemented in libquickjs
which could be mimicked by the fuzzer.
The removal of runtime and context objects implicated further
simplifications, like the omission of their one-time creation.
Finally, writing the result of the regexp operations into a file
is also superfluous, since it's not used by anybody.
…pile targets

Before this patch, the test executions were not independent,
since all the executed tests used the same JavaScript runtime and
context, causing irreproducible failure reports.
Big numbers are used by the input corpus, but the targets were not
able to interpret them since they were not compiled into them.
This change improved the inital coverage of the fuzz_eval target with
21% and the coverage of the fuzz_compile target with 25% when using
the official corpus.
Furthermore, added a JS script that collects all the builtin
names from the executing engine.
@renatahodovan
Copy link
Contributor Author

I hope that I addressed or answered all the requests and questions.

Copy link
Contributor

@saghul saghul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a first check-in! @chqrlie PTAL!

@chqrlie
Copy link
Collaborator

chqrlie commented May 7, 2024

LGTM as a first check-in! @chqrlie PTAL!

Very interesting feature. I am waiting for Fabrice's approval to merge this in. Good work!

@chqrlie chqrlie merged commit 01454ca into bellard:master May 8, 2024
13 checks passed
@chqrlie
Copy link
Collaborator

chqrlie commented May 8, 2024

LGTM as a first check-in! @chqrlie PTAL!

Very interesting feature. I am waiting for Fabrice's approval to merge this in. Good work!

Thank you @renatahodovan for your contribution. Fuzzing is an enticing path to find software flaws. You seem to have an advanced understanding of the domain, do you participate in conferences such as FOSdem in Brussels where I followed several sessions on the subject? I would be interested to follow the progress and discuss research ideas on this.

Best regards

Chqrlie

@renatahodovan renatahodovan deleted the fuzz-dict branch May 9, 2024 18:48
@renatahodovan
Copy link
Contributor Author

Hi @chqrlie !
It's always a pleasure to meet people with similar interests! 😊 I haven't attended FOSDEM in person yet, but some of my colleagues have been a few times. Perhaps next time... My next conference target is ISSTA in September. It's a bit further away, but that shouldn't stop us from discussing research ideas. I'm available via email, Gitter, or any other suitable platform.

GerHobbelt pushed a commit to GerHobbelt/quickjs that referenced this pull request Jun 21, 2024
36911f0 regexp: fix non greedy quantizers with zero length matches
d86aaf0 updated test262.patch
adec734 fixed test of test262 directory
d378a9f Improve `js_os_exec` (bellard#295)
97be5a3 Add `js_resolve_proxy` (bellard#293)
f3f2f42 Add `JS_StrictEq()`, `JS_SameValue()`, and `JS_SameValueZero()` (bellard#264)
6f9d05f Expose `JS_SetUncatchableError()` (bellard#262)
d53aafe Add the missing fuzz_common.c (bellard#292)
db9dbd0 Add `JS_HasException()` (bellard#265)
6c43013 Add `JS_NewTypedArray()` (bellard#272)
01454ca OSS-Fuzz targets improvements (bellard#267)
0c8feca Improve class parser (bellard#289)
d9c699f fix class method with name get (bellard#258)
7a2c6f4 Improve libunicode and libregexp headers (bellard#288)
1402478 Improve unicode table handling (bellard#286)
3b45d15 Fix endianness handling in `js_dataview_getValue` / `js_dataview_setValue`
653b227 Improve error handling
203fe2d Improve `JSON.stringify`
ce6b6dc Use more explicit magic values for array methods
c0e67c4 Simplify redundant initializers for `JS_NewBool()`
0665131 Fix compilation with -DCONFIG_BIGNUM
65ecb0b Improve Date.parse, small fixes
6a89d7c Add CI targets, fix test_std.js (bellard#247)
ebe7496 Fix build: use LRE_BOOL in libunicode.h (bellard#244)
1a5333b prevent 0 length allocation in `js_worker_postMessage`
e17cb9f Add github CI tests
06c100c Prevent UB on memcpy and floating point conversions
3dd93eb fix microbench when microbench.txt is missing (bellard#246)
35b7b3c Improve Date.parse
8d64731 Improve Number.prototype.toString for radix other than 10
a78d2cb Improve repl regexp handling
8180d3d Improve microbench.js
78db49c Improve Date.parse
6428ce0 show readable representation of Date objects in repl
27928ce Fix Map hash bug
b70e764 Rewrite `set_date_fields` to match the ECMA specification
b91a2ae Add C API function JS_GetClassID()
12c91df Improve surrogate handling readability
8d932de Rename regex flag and field utf16 -> unicode
97ae6f3 Add benchmarks target
c24a865 Improve run-test262
bbf36d5 Fix big endian serialization
530ba6a handle missing test262 gracefully
0a361b7 handle missing test262 gracefully
74bdb49 Improve tests
85fb2ca Fix UB signed integer overflow in js_math_imul
8df4327 Fix UB left shift of negative number
3bb2ca3 Remove unnecessary ssize_t posix-ism
c06af87 Improve string concatenation hack
8e21b96 pass node-js command line arguments to microbench
95e0aa0 Reverse e140122
1fe0414 Fix test262 error
ef4e7b2 Fix compiler warnings
92e339d Simplify and clarify URL quoting js_std_urlGet
636c946 FreeBSD QuickJS Patch (bellard#203)
ae6fa8d Fix shell injection bug in std.urlGet (bellard#61)
693449e add gitignore for build objects (bellard#84)
e140122 Fix sloppy mode arguments uninitialized value use
6dbf01b Remove unsafe sprintf() and strcat() calls
6535064 Fix undefined behavior (UBSAN)
e53d622 Fix UB in js_dtoa1
fd6e039 Add UndefinedBehaviorSanitizer support
325ca19 Add MemorySanitizer support

git-subtree-dir: vendor/quickjs
git-subtree-split: 36911f0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants