How to Run:

Ensure benchmarks directory is current root directory. If not, use from directory root,

cd `benchmarks`

Main run script: ./simple_infra/infra_run.py
Run ./simple_infra/infra_run.py -h for flags and options.
Test and measure aggregator performance with,
Explaination of files generated from run

Run all available benchmark sets:

First, download all inputs.

./run-all.sh --inputs # Download all input files.

Here are the configurations to the scripts.

--small: use small input
--inf: input inflation between stages
--all: use lean and python aggregators
--lean: use lean aggregators (default is python aggregators)

For example, to run all benchmarks with python and lean aggregators with input inflation on smaller input,

./run-all.sh --small --all --inf

Cleanup all intermediate files.

./run-all.sh --clean

Run with one benchmark set:

Below, we show how to run the oneliners benchmark only. First cd into the benchmark set and download input:

cd oneliners
./inputs.sh # Download input files.

Example configuration to run suite with.

./run.sh --small # Run with default python on 1M input without input inflation.
./cleanup.sh # Remove all intermediate files.

Other configurations. Ensure to save results and use clean up script before running new configuration.

./run.sh --small --all # Run with both lean and python aggregators on 1M input without input inflation.
./run.sh --small --all --inf # Run with both lean and python aggregators on 1M input with input inflation.

Check and prints out if there are incorrect aggregators.

./run.sh --check

Run with custom scripts + inputs

Running from one directory ensures all intermediate files are organized. Here, we will create and run from the run directory.

# Use python aggregator without input inflation.
mkdir run
cd run
../simple_infra/infra_run.py -n 2 -i ../oneliners/inputs/1M.txt -s ../oneliners/scripts/sort.sh -id 1 -agg python -o out.txt

# Use lean aggregator with input inflation.
mkdir run
cd run
../simple_infra/infra_run.py -n 2 -i ../oneliners/inputs/1M.txt -s ../oneliners/scripts/sort.sh -inflate -id 1 -agg lean -o out.txt

# Use specified aggregator without input inflation.
mkdir run
cd run
../simple_infra/infra_run.py -n 2 -i ../oneliners/inputs/1M.txt -s ../oneliners/scripts/sort.sh -id 1 -agg ../../py-2/s_sort.py -o out.txt

Files you'll see after running

infra_metrics.csv: CSV file with main metric results; Header is as follows: script,input,input size,adj input size,cmd,agg,agg time,agg correct,cmd seq time
infra_debug.log: more detailed execution log
inputs-s-[ID]: org: split files; cmd: files after applying current command instance (parallel partials)
outputs-temp: agg-[ID] parallel output files per command instance; seq-check-[ID] sequential output files per command instance (to check aggregator correctness)
<output.txt> : output file produced after running entire script with this infrastructure (provided as last argument to ../infra_run.py)

Pipeline for Running with Benchmark

Random Input Generation:

Given total bytes desired, max byte per line, min byte per line, generate file with random word
Given total bytes, total lines, percentage of non-distinct line length, and percentage of non-distinct word, generate file with random words that guarantee and best adhere to given percentages (depending on randomization and total lines given, we might have more repeats than desired percentage)
Given total bytes, regex or word, and percentage of bytes allocated to generate words that match pattern, generate file with random word
Currently, change probability and size settings in the relative python file's main function.
Example Run:

python3 generation_with_regex.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

How to Run:

Run all available benchmark sets:

Run with one benchmark set:

Run with custom scripts + inputs

Files you'll see after running

Pipeline for Running with Benchmark

Random Input Generation:

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

How to Run:

Run all available benchmark sets:

Run with one benchmark set:

Run with custom scripts + inputs

Files you'll see after running

Pipeline for Running with Benchmark

Random Input Generation: