A command line FASTQ file toolbox
Find and summarize the presence of given primers (or any subseq) within a fastq file
python3 primerFind.py -s <sequenceFile> -p <primerFile>
-v
/--verbose
- gives debug output
-r
/--reporting
- creates report files based on hits found
A straightforward program which can take a file list of fastq's, concatenate them, and perform FastQC on them in serial or scalable parallel
Python - built on 3.10x but should be fine for the forseeable future
FastQC - the underlying tool to perform QC on your read files
Bash Environment - current mechanics rely on system calls to the bash command line
File structure should follow the convention SampleID_Lane#_Read#.fastq
. For example:
Exp001_S1_L002_R1_001.fastq
FastQC Pipe expects a tab-delimited text file with the following structure (lines led with '#' are ignored):
Path/of/directory | Sample_Name | R1/2/Both
Example:
/home/RPINerd/M01234/Fastq_Generation Exp001_S1 2
/home/RPINerd/M01234/Fastq_Generation Exp001_S2 1
/home/RPINerd/M01234/Fastq_Generation Exp001_S3 Both
General format for run is python3 fastqc_pipe.py -f file_list.tsv
The input file is the only required argument however there are some additional options:
--threads
or -t #
specifies how many threads to allocate to the fastqc algorithm, currently capped at 12
--merge
or -m <desired/path/to/>
allows for a preferred location to be specified for the merged lanes to be written to
--verbose
or -v
pumps out a ton of extra info and saves it to the file fastqc_pipe.log
Provided a single, or list of, regular expressions, filter out all reads from a given fastq file which match said expression(s).
One use case could be dropping reads with a large amount of poly-A sequences:
python fqfilter.py -f Sample01.fastq -d "[A]{25}"
Or perhaps you have a specific adapter at the start of the read that you'd like to isolate
python fqfilter.py -f Sample01.fastq -k "^ATCGGCTA"
And you can also do both at once!
python fqfilter.py -f Sample01.fastq -k "^ATCGGCTA" -d "[A]{25}"
-f /--fastq |
Input fastq file to filter |
-p /--pair |
Input pair of fastqs. R1 then R2. Files will be syncronized in the ouput for safety of downstream processing! |
-k /--keep |
Regex expression(s) to keep |
-d /--drop |
Regex expression(s) to drop |
-s /--sync |
|
-v /--verbose |
Debug output |
Built on Python 3.11, but may be fine back as far as 3.7
Needs the BioPython package