This example project will count words in a given text and plot a bar chart of the 10 most common words.
See environment.yml
.
In this example we wish to:
- Analyze word frequencies using statistics/count.py for 4 books (they are all in the data directory).
- Plot a histogram using plot/plot.py
For one book (isles.txt
) use the scripts like this:
$ python code/count.py data/isles.txt > statistics/isles.data
$ python code/plot.py --data-file statistics/isles.data --plot-file plot/isles.png
To run these scripts for all books you can collect these calls all into one bash script and run it with bash run_all.sh
.
One step further and less code, you could also loop through all known book titles in a bash script and run it with: bash run_all_loop.sh
.
Implemented using Snakemake in Snakefile
.
End to end tests are provided in the test directory.
Inspired by and derived from https://hpc-carpentry.github.io/hpc-python/ which is distributed under Creative Commons Attribution license (CC-BY 4.0).
We use this example in the CodeRefinery workshop in this lesson: