Skip to content

Commit

Permalink
Merge pull request #450 from simsong/rel-2.10
Browse files Browse the repository at this point in the history
Rel 2.10
  • Loading branch information
simsong authored Jan 25, 2024
2 parents a20dfe5 + 77c5224 commit b1e500f
Show file tree
Hide file tree
Showing 22 changed files with 140 additions and 205 deletions.
35 changes: 16 additions & 19 deletions .github/workflows/continuous-integration-pip.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,20 @@ jobs:
run: |
echo "" | bash etc/CONFIGURE_UBUNTU22LTS.bash
- name: Make configure script
- name: Version Numbers
run: |
autoconf --version
automake --version
aclocal --version
gcc --version
g++ --version
- name: Files
run: |
find . -print
- name: Make configure script
run: |
bash bootstrap.sh
- name: Dump configure script
Expand All @@ -84,7 +91,7 @@ jobs:
- name: C++ checks optimization with address-sanitizer (Mac and Linux)
run: |
echo === Try Address Sanitizer Optimization ===
./configure --enable-address-sanitizer
./configure --enable-address-sanitizer --enable-silent-rules --quiet
make $MAKE_OPTS all
pushd src
make $MAKE_OPTS bulk_extractor
Expand All @@ -96,7 +103,7 @@ jobs:
if: startsWith(matrix.os, 'ubuntu-DISABLED')
run: |
bash bootstrap.sh
./configure --enable-silent-rules --enable-thread-sanitizer
./configure --enable-thread-sanitizer --enable-silent-rules --quiet
make clean
make $MAKE_OPTS all
pushd src
Expand All @@ -109,7 +116,7 @@ jobs:
if: startsWith(matrix.os, 'ubuntu')
run: |
bash bootstrap.sh
./configure --disable-opt CFLAGS='-g -O0 -fprofile-arcs -ftest-coverage' CXXFLAGS='-g -O0 -fprofile-arcs -ftest-coverage' LIBS='-lgcov -lre2'
./configure --disable-opt CFLAGS='-g -O0 -fprofile-arcs -ftest-coverage' CXXFLAGS='-g -O0 -fprofile-arcs -ftest-coverage' --enable-silent-rules --quiet
make clean
make $MAKE_OPTS check || (echo ==error== ; cat test-suite.log; exit 1)
Expand All @@ -123,21 +130,11 @@ jobs:
bash <(curl -s https://codecov.io/bash)
popd
- uses: ammaraskar/[email protected]
name: GCC Problem Matcher

- name: distcheck
run: |
echo we are doing this because 'make distcheck' makes the source read-only and builds in a _build directory and currently that is broken
bash bootstrap.sh
./configure
make clean
make dist
mv bulk_extractor*.tar.gz /tmp
pushd /tmp
tar xfvz bulk_extractor*.tar.gz
pushd $(basename bulk_extractor*.tar.gz .tar.gz)
ls -l
./configure
make check
popd
./configure -q
make distcheck
- uses: ammaraskar/[email protected]
name: GCC Problem Matcher
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,7 @@ tests/Makefile
win32
win64
README
*.auto_defs
*.so*
*.pdf
*.zip
16 changes: 7 additions & 9 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

# We need to list *all* of the subdirs here that have Makefiles in them that are executed by
# doing a chdir into the directory and typing 'make'. So we do not need src/be20_api
SUBDIRS = doc doc/latex_manuals man src src/tests python specfiles tests
SUBDIRS = doc doc/latex_manuals man src src/tests specfiles tests

# Include things explicitly
# Note - autoconf seems to automatically include the m4 rules that are used by configure.ac - no need to list them heren
Expand All @@ -18,15 +18,13 @@ SUBDIRS = doc doc/latex_manuals man src src/tests python specfiles tests
# note - we are allowed to use wildcards below! Who knew.
# and using $wildcard prevents it from running on mac, which doesn't use gnu make

EXTRA_DIST = \
$(SRC_WIN_DIST) \
$(srcdir)/*.md \
include Makefile.auto_defs

EXTRA_DIST = $(SRC_WIN_DIST) $(AUTO_DOC_FILES) $(AUTO_ETC_FILES) $(AUTO_LICENSES) \
$(srcdir)/.gitignore \
$(srcdir)/bootstrap.sh \
$(srcdir)/etc/*.bash \
$(srcdir)/licenses/LICENSE.* \
$(srcdir)/*.txt \
$(srcdir)/src/be20_api/dfxml_cpp/src/Makefile.defs
$(srcdir)/CODING_STANDARDS.md \
$(srcdir)/LICENSE.md \
$(srcdir)/README.md


# ACLOCAL_AMFLAGS = ${ACLOCAL_FLAGS} -I m4
Expand Down
46 changes: 16 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,46 +1,32 @@
[![codecov](https://codecov.io/gh/simsong/bulk_extractor/branch/main/graph/badge.svg?token=3w691sdgLu)](https://codecov.io/gh/simsong/bulk_extractor)

`bulk_extractor` is a high-performance digital forensics exploitation tool.
It is a "get evidence" button that rapidly
scans any kind of input (disk images, files, directories of files, etc)
and extracts structured information such as email addresses, credit card numbers,
JPEGs and JSON snippets without parsing the file
system or file system structures. The results are stored in text files that are easily
inspected, searched, or used as inputs for other forensic processing. bulk_extractor also creates
histograms of certain kinds of features that it finds, such as Google search terms and email addresses,
as previous research has shown that such histograms are especially useful in investigative and law enforcement applications.
`bulk_extractor` is a high-performance digital forensics exploitation
tool. It is a "get evidence" button that rapidly scans any kind of
input (disk images, files, directories of files, etc) and extracts
structured information such as email addresses, credit card numbers,
JPEGs and JSON snippets without parsing the file system or file system
structures. The results are stored in text files that are easily
inspected, searched, or used as inputs for other forensic
processing. bulk_extractor also creates histograms of certain kinds of
features that it finds, such as Google search terms and email
addresses, as previous research has shown that such histograms are
especially useful in investigative and law enforcement applications.

Unlike other digital forensics tools, `bulk_extractor` probes every byte of data to see if it is the start of a
sequence that can be decompressed or otherwise decoded. If so, the
decoded data are recursively re-examined. As a result, `bulk_extractor` can find things like BASE64-encoded JPEGs and
compressed JSON objects that traditional carving tools miss.

This is the `bulk_extractor` 2.0 development branch! For information
about the `bulk_extractor` update, please see [Release 2.0 roadmap](https://github.com/simsong/bulk_extractor/blob/main/doc/ROADMAP_2.0.md).
This is the `bulk_extractor` 2.1 development branch! It is reliable, but if you want to have a well-tested production quality release, download a release from https://github.com/simsong/bulk_extractor/releases.

Building `bulk_extractor`
=========================
To build bulk_extractor in Linux or Mac OS:
We recommend building from sources. We provide a number of `bash` scripts in the `etc/` directory that will configure a clean virtual machine.

1. Start with a clean virtual machine. We recommend the current version of Fedora, as it can build both the Linux and the Windows executables. The Windows executable currently does not build under Ubuntu because of deficiencies in the mingw compiler libraries on that platform.

2. Then run these commands:

```
$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git
$ cd bulk_extractor/etc
$ bash CONFIGURE_FEDORA36.bash
$ cd ..
$ ./bootstrap.sh
$ ./configure
$ make
$ sudo make install
```

3. To compile the Windows executable, try:
If you wish to build for Windows, you should cross-compile from a Fedora system. Start with a clean VM and use these commands:

```
$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git
$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git
$ cd bulk_extractor/etc
$ bash CONFIGURE_FEDORA36_win64.bash
$ cd ..
Expand All @@ -58,7 +44,7 @@ This release of bulk_extractor requires C++17 and has been tested to compile on

* Amazon Linux as of 2023-05-25
* Fedora 36 (most recently)
* Ubuntu 20.04LTS
* Ubuntu 20.04LTS
* MacOS 13.2.1

You should *always* start with a fresh VM and prepare the system using the appropriate prep script in the `etc/` directory.
Expand Down
3 changes: 3 additions & 0 deletions bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ do
fi
done

# Makesure files are in src/Makefile.auto_defs
python3 etc/makefile_builder.py

# have automake do an initial population if necessary
autoheader -f
touch NEWS README AUTHORS ChangeLog
Expand Down
2 changes: 1 addition & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ AC_MSG_NOTICE([at start CPPFLAGS are $CPPFLAGS])
## Includes

AC_CONFIG_FILES([Makefile doc/Makefile doc/latex_manuals/Makefile src/Makefile src/tests/Makefile man/Makefile \
python/Makefile specfiles/Makefile specfiles/bulk_extractor.spec.m4 tests/Makefile ])
specfiles/Makefile specfiles/bulk_extractor.spec.m4 tests/Makefile ])

AC_CONFIG_HEADERS([config.h])
AC_CONFIG_AUX_DIR([build-aux])
Expand Down
9 changes: 0 additions & 9 deletions doc/1page.txt

This file was deleted.

18 changes: 0 additions & 18 deletions doc/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,3 @@ AUTOMAKE_OPTIONS = subdir-objects

DOXYGEN_FILES = doxygen/Doxyfile doxygen/Makefile doxygen/programmer_manual.doxygen doxygen/README \
doxygen/images/BEViewer_blank.eps doxygen/images/BEViewer_blank.png

EXTRA_DIST = \
*.txt \
*.pdf \
*.md \
latex_manuals/viewerPics/*.png \
announce/*.txt \
announce/*.md \
bulk_extractor.html \
Diagnostic_Notes/be_crash_diagnostics.md \
Diagnostic_Notes/bulk_extractor.md
Diagnostic_Notes/crash_diagnosis1.txt \
Diagnostic_Notes/crash_diagnosis2.txt\
Diagnostic_Notes/debugging_performance_problems.html \
Diagnostic_Notes/open_source_help \
Diagnostic_Notes/using_gdb \
performance.txt \
$(DOXYGEN_FILES)
File renamed without changes.
2 changes: 0 additions & 2 deletions etc/.gitignore

This file was deleted.

5 changes: 4 additions & 1 deletion etc/CONFIGURE_MACOS.bash
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/bin/bash
source paths.bash

MYDIR=$(dirname $(readlink -f $0))

source $MYDIR/paths.bash
if [ -r /usr/local/bin/brew ]; then
WHICH=/usr/local/bin/brew
elif [ -r /opt/homebrew/bin/brew ]; then
Expand Down
44 changes: 0 additions & 44 deletions etc/install_autotools.sh

This file was deleted.

77 changes: 77 additions & 0 deletions etc/makefile_builder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/usr/bin/env python3

import os
import fnmatch
import os.path


config_root = {
'root':'.',
'outfile':'Makefile.auto_defs',
'rules':[['doc', 'AUTO_DOC_FILES', ['*.pdf', '*.html', '*.txt', '*.md', '*.tex', '*.gitignore']],
['etc', 'AUTO_ETC_FILES', ['*.bash', '*.py', '*.gitignore']],
['python', 'AUTO_PYTHON_FILES', ['*.py']],
['licenses', 'AUTO_LICENSES', ['*']]],
'ignore_fnames':[],
'ignore_paths':[]
}

config_src = {
'root':'src',
'outfile':'Makefile.auto_defs',
'rules':[['be20_api/', 'AUTO_CPP_FILES', ['*.cpp']],
['be20_api/', 'AUTO_H_FILES', ['*.h']],
['tests/', 'AUTO_TESTS_DIST', ['*']],
['rar/', 'AUTO_RAR_FILES', ['*.cpp','*.hpp']],
['.', 'AUTO_EXTRA_DIST', ['*.md','*.am','*.txt','*.py','*.bash','.gitignore']]],
'ignore_fnames':set(['Makefile.am','Makefile.in',
'dfxml_demo.cpp',
'dfxml_version.cpp',
'iblkfind.cpp',
'test_dfxml.cpp',
'smoke.cpp',
'test_be20_api.cpp',
'test_be20_threadpool.cpp'
]),
'ignore_paths':set(['be20_api/utfcpp/tests',
'be20_api/utfcpp/samples',
'be20_api/tests',
'be20_api/demos'])}

def build(config):
cwd = os.getcwd()
os.chdir(config['root'])
ignore_fnames = config['ignore_fnames']
ignore_paths = config['ignore_paths']

def should_ignore(full_path):
if os.path.basename(full_path) in ignore_fnames:
return True
for ip in ignore_paths:
if ip in full_path:
return True
return False

vars = dict()
for (start,name,pats) in config['rules']:
matches = []
for (root, dirs, files) in os.walk(start, topdown=False):
for fn in files:
for pat in pats:
full_path = os.path.relpath(os.path.join(root,fn))
if fnmatch.fnmatch(fn, pat) and not should_ignore(full_path):
matches.append( full_path )
vars[name] = sorted(matches)
with open(config['outfile'],'w') as f:
for (name,matches) in vars.items():
if len(matches)==0:
continue
f.write(f"\n{name} = ")
for fname in matches:
f.write(f" \\\n\t{fname}")
f.write("\n")
os.chdir(cwd)

if __name__=="__main__":
build(config_root)
build(config_src)
11 changes: 0 additions & 11 deletions make_bench.sh

This file was deleted.

1 change: 0 additions & 1 deletion python/.gitignore

This file was deleted.

1 change: 0 additions & 1 deletion python/Makefile.am

This file was deleted.

2 changes: 0 additions & 2 deletions python/module/.gitignore

This file was deleted.

1 change: 1 addition & 0 deletions src/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ x.cpp
domexusers.raw*
be_graph.pdf
*.Tpo
Makefile.auto_defs
Loading

0 comments on commit b1e500f

Please sign in to comment.