Primarily, this repository originates from ad hock scripts in Python, Bash/sh, and R that I designed to accomplish tasks I've encountered during processing, plotting, statistical analysis, clustering/machine learning exploration, and database-facilitated analysis of next-generation sequencing datasets, including:
- ChIP-seq
- RNA-seq
- ChEC-seq
- MNase-seq
- SpLiT-ChEC-seq
- Differential analysis of nucleosome positions
Documentation for all the scripts will be updated (when time allows) to make them more accessible.
SEAPE and SpLiT-ChEC analysis with R (SCAR) are intended to support rigorous, reproducible, well-documented bioinformatics analysis and have been designed to help efficiently complete tasks like exploration of novel datasets for trends/extractable information or assessing relationships within/between datasets and experiment types. Current development is targeted towards:
- Characterizing chromatin organization patterns in SpLiT-ChEC-seq data
- Defining relationships between identifiable features of chromatin organization and various measurable outputs (RNA expression, protein-DNA binding site occupancy, DNA shape, proximity to functional DNA sequence features, ?)
- Relating DNA sequence to protein binding information and chromatin structure using clustering, machine learning and/or deep learning techniques
Ultimately, I hope to apply skills from building SpLiT-ChEC-seq, SCAR, and SEAPE to help design, build, and implement technologies that improve or expand therapeutic and diagnostic tools tackling challenging diseases in modern society.