Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update lightgrep scanner for bulk_extractor 2.0 #421

Draft
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

juliapaluch
Copy link

This PR has the following functionality changes:

  • scan_lightgrep searches for user-specified keywords from the -f and -F options, by default searching for both UTF-8 and UTF-16LE versions, with case-sensitivity
  • The following scanners have been deleted:
    • scan_accts_lg
    • scan_base16_lg
    • scan_email_lg
    • scan_gps_lg

With the deletion of other lightgrep-based scanners, we were able to delete a lot of scaffolding code.

This PR is not yet ready, but we're opening it for comment. The following remains to be done:

  • Write build documentation
  • Specify a lightgrep release
  • Write scan_lightgrep usage documentation
  • Test the Windows build

Please let us know if you have any questions or comments.

juliapal and others added 30 commits May 2, 2023 14:39
…rmance regression. The timings below are from the following command:

./src/bulk_extractor -F ../lightgrep/pytest/keys/shuf10.txt -Z -o ~/be_timed_output_without_thread_local_`printf %04d $i` -E scan_lightgrep ~/ev/terry-2009-12-11-002.E01

Thread_local?	Clocktime (Min.)	Clocktime (Max.)	Clocktime (Average)	Scan Lightgrep Time (Min.)	Scan Lightgrep Time (Max.)	Scan Lightgrep Time (Average)
FALSE	162.965479	168.628229	164.2545712	494.810946	528.368114	504.1799554
TRUE	163.681386	173.587754	167.233617	499.815450	532.324335	516.4901762

This reverts commit 0ca43ec.
@simsong
Copy link
Owner

simsong commented May 30, 2023

I'm going to close this and re-open it as a draft PR.

@simsong simsong closed this May 30, 2023
@simsong simsong reopened this May 30, 2023
@simsong simsong marked this pull request as draft May 30, 2023 23:47
@simsong
Copy link
Owner

simsong commented May 30, 2023

Apparently that's not how you did it. I found instructions here. It's a draft now.

@codecov
Copy link

codecov bot commented May 31, 2023

Codecov Report

Merging #421 (16e8eeb) into main (7935c41) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #421   +/-   ##
=======================================
  Coverage   47.94%   47.94%           
=======================================
  Files         112      112           
  Lines       13224    13224           
=======================================
  Hits         6339     6339           
  Misses       6885     6885           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jonstewart
Copy link
Collaborator

I didn't know about draft PRs, TIL.

@simsong
Copy link
Owner

simsong commented Nov 9, 2023

Is this PR ready to go?

@jonstewart
Copy link
Collaborator

jonstewart commented Nov 9, 2023 via email

@simsong
Copy link
Owner

simsong commented Jan 7, 2024

Hi. What's the status on this?

@jonstewart
Copy link
Collaborator

We're getting ready to make a new lightgrep release for this to target. Can you review scan_lightgrep.cpp?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants