Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit fuzzer source code. #2

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Commit fuzzer source code. #2

wants to merge 19 commits into from

Conversation

viniul
Copy link

@viniul viniul commented Jan 22, 2020

As discussed on the mailing list already, this PR adds the fuzzing harnesses written by @bshastry and me to the xerces upstream. An initial integration into already uncovered some bugs. Integrating the fuzzing harnesses into upstream provides a cleaner way to test the xerces code.

The purpose of this PR and the integration of xerces into oss-fuzz is to allow parts of xerces' code to be continuously fuzzed, which would probably result in the detection of security bugs early on in the development process.

CC @bshastry

@viniul
Copy link
Author

viniul commented Apr 4, 2020

Just following up - is there still interest in merging this PR? If so, is there anything on our side to help get this PR merged?

@scantor
Copy link
Contributor

scantor commented Apr 6, 2020

It appears this is code that isn't part of the actual build (no makefile changes), so what's the significance of this being in the distribution? Seems like maintaining it as a patch would be more agile since there are no committers with any familiarity or involvement in it.

That said, I don't have any personal objection, and it appears to be Apache-licensed, so I can check it in if there's value. I can't accept the PR (no access and I don't use GitHub for commits) but I can apply a patch via GitBox.

@viniul
Copy link
Author

viniul commented Apr 6, 2020

You are right, the current PR does not make any changes to the Makefile - that was a mistake on our part, thank you for pointing that out. We will add another commit to this PR that will introduce the necessary changes to the Makefile/configure file. I would suggest that the changes introduce a new flag --build-fuzzers to to the configure file, so that the fuzzing binaries will only be build once the flag is enabled. This would have the advantage that developers could immediately fuzz their code prior to commiting if they are interested.

@scantor
Copy link
Contributor

scantor commented Apr 6, 2020

That clarifies things, thank you. I'm not opposed, but you can save me some time by attaching the patch to a JIRA ticket (there may be one already for your original request? I don't recall) rather than using GitHub, which I cannot make use of.

@viniul
Copy link
Author

viniul commented Apr 6, 2020

Absolutely! Anything else we can do on our side to make this integration as easy as possible?

@scantor
Copy link
Contributor

scantor commented Apr 6, 2020

Just minimizing the impact on the configure script, so I can avoid having to do too much integration testing/review, but otherwise since all the code you're adding is separate I really don't have other concerns.

@viniul
Copy link
Author

viniul commented Apr 6, 2020

Got it, will keep that in mind. Thank you very much - I will work on the changes to the configure script and post a link to the jira ticket here as soon as I am done.

@scantor
Copy link
Contributor

scantor commented Apr 7, 2020

This will probably have to miss the next patch because I need to get this release done but we can commit for the next.

@viniul
Copy link
Author

viniul commented May 8, 2020

Hey @scantor one last question: Would you also be fine with changes to the cmake file? I would prefer this as opposed to adapting the configure script, if it would be fine for you. From what I can see, adapting the cmake file can be done with very little impact, whereas I am not so sure I can say the same for the configure script

@scantor
Copy link
Contributor

scantor commented May 8, 2020

It depends what your goal is. The CMake script is not the supported means of building anywhere but Windows, but if the changes you're adding aren't formally part of the build process for a normal release on the other platforms then I don't care that much if they're included.

@viniul
Copy link
Author

viniul commented May 22, 2020

I opened a jira ticket here:
https://issues.apache.org/jira/browse/XERCESC-2199

@viniul
Copy link
Author

viniul commented Jun 20, 2020

@scantor Anything we can do to improve the patch?

@scantor
Copy link
Contributor

scantor commented Jun 20, 2020

I haven't looked at it. I can't spend time until/unless something arises to force me to. You'd be best off trying to get @rleigh-codelibre to include it in the discussions about the master branch and doing a 4.0 release.

Once you missed the last patch window, that was really it for me until something else comes up.

@viniul
Copy link
Author

viniul commented Jun 20, 2020

CC @rleigh-codelibre As described above, this PR adds some fuzzing harnesses for xerces, which we (@bshastry and me) think would be nice to have in the master branch, because it allows continuous fuzzing via oss-fuzz and also makes it very easy for developers to fuzz-test their code prior to the release. Should I post this to the mailing list?

@rleigh-codelibre
Copy link
Contributor

@viniul The addition of a test looks fairly low-risk in terms of wider impact on the project.

Has the necessary Apache ICLA and/or company CLA been done?

On the integration side, it would be nice to have this added to the CMakeLists.txt as for the other tests, and likewise to Makefile.am. If the fuzzing is long-running, I would suggest not adding to CTest so that it isn't part of the regular integration testing.

@viniul
Copy link
Author

viniul commented Jun 20, 2020

@rleigh-codelibre I updated the PR to also includes the changes for CMake. Is this sufficient to get the PR merge dfor now? I f so, let me create a new clean PR based on the current master branch of xeres-c

@@ -105,3 +105,6 @@ tests/scripts/*.log
tests/scripts/*.trs
tests/test-suite.log
Testing/

# Vincent
build/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be kept private. Please could you remove it from the PR?

@@ -177,8 +177,14 @@ install(
# Process subdirectories
add_subdirectory(doc)
add_subdirectory(src)
add_subdirectory(tests)
add_subdirectory(samples)
if (NOT (DEFINED XERCES_BUILD_FUZZERS))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these specifically excluded? It might be nicer to only use the fuzzer-specific conditions on the fuzzer code itself.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzzing requires some compile-time instrumentation of the xerces-library code and certain libraries that need to be linked against the instrumented code later on. Since this instrumentation is not needed for the other tests, because I thought this way it would be cleaner. Do you still want to me to change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid I misunderstand your question though: Do you want just the if condition to change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that the fuzzer keep its conditionals self-contained if possible. That is, rather than disabling other unrelated features, it should only enable or disable the fuzzing functionality. If you want to disable building of the tests and/or samples, I think they should have separate options to allow them to be enabled or disabled independently.

You could then do -Dtests=OFF -Dsamples-OFF -Dfuzzer=ON to achieve the same effect.

Also, the extra compile-time stuff using ExternalProject looks separate from building the fuzzer itself. Maybe always build the fuzzer but have that additional stuff gated on the fuzzer option. Or gate all of it. But I would put that conditional stuff inside the fuzzer CMakeLists.txt rather than having any conditionals around include statements.

add_subdirectory(samples)
endif()

if ((DEFINED XERCES_BUILD_FUZZERS) OR (DEFINED XERCES_BUILD_FOR_OSS_FUZZ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always include all subdirectories, and place the above conditionals around the logic inside the file. This ensures the logic is evaluated whatever the cmake options, and helps to prevent regressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, rather than use XERCES_BUILD_FUZZERS, why not define an option fuzzers with option() defaulting to off, then you can just use -Dfuzzers=ON?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do that, thanks!

@@ -0,0 +1,8 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this file being used anywhere. Does it really belong in the source tree?

If it does belong, could it go into the scripts subdirectory alongside the CI scripts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily needed, but I thought it might be helpful if someone does not want to figure out the build options by themselves. Should it go into the scripts sub-directory or be completely removed?

@@ -58,14 +58,14 @@ set(HAVE_OFF_T ${SIZEOF_OFF_T})
set(HAVE_SIZE_T ${SIZEOF_SIZE_T})
set(HAVE_SSIZE_T ${SSIZEOF_SSIZE_T})
set(HAVE_WCHAR_T ${WCHAROF_WCHAR_T})
if(SIZEOF_SIZE_T)
if(HAVE_SIZEOF_SIZE_T)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where these changes have come from. Is this a reversion of an earlier change?

If the history is messy, it might be easiest to rebase all your changes onto the current master and start a new PR.


# Definitions required for building
add_definitions(
-DHAVE_CONFIG_H=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is necessary, I think target_compile_definitions would be more appropriate, rather than setting it globally.

@@ -8837,7 +8837,7 @@ XMLByte XMLChar1_1::fgCharCharsTable1_1[0x10000] =

#include <stdio.h>

static XMLCh gTmpCharTable[0xFFFF];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about these any other code changes, above. Are they intentional?

If they are, please could you open separate PRs for them so we can keep code fixes separate from the fuzzer addition.

@viniul
Copy link
Author

viniul commented Jun 20, 2020

@rleigh-codelibre Thank you very much for reviewing this PR. I will implement the suggested changes and open a new PR based on the current master.

@rleigh-codelibre
Copy link
Contributor

@viniul I have done a preliminary review, please see the above comments.

Please could you also answer the question regarding the licensing. I see Google copyrights on some of the changes. And presumably there are your own and/or your employer's copyrights in addition. The necessary corporate and individual copyright licensing agreements must be submitted before we are able to merge your work.

@viniul
Copy link
Author

viniul commented Jun 20, 2020

@rleigh-codelibre We have not yet submitted the Apache licensing agreement - this is individual work by me and @bshastry, so I think if we both submit the ICLA this should be sufficient, correct?

The two Google copyrights came because we started those two files based on a Google template. The final file that will be committed was written by us though (in fact this was written by @bshastry, could you please confirm?) -- So I think we can just replace the Google copyright.

@rleigh-codelibre
Copy link
Contributor

If all the work is individual and there is no corporate ownership of any of it, then if you both submit the needed ICLA that should be sufficient. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants