Skip to content

Create data from scratch

Dave Lawrence edited this page Mar 2, 2023 · 9 revisions

Install Dependencies

Most users of cdot will use the REST API or download existing JSON files, so these extra dependencies are not installed by default with the package.

sudo apt -y install postgresql-client-12 # Need psql to extract UTA transcripts
python3 -m pip install --upgrade htseq # Need version after 2.0.2 to handle 109.20211119 GFF3

Download and generate all transcripts

This requires a 3 gigs of disk space and 4 gigs of RAM

export [email protected]  # change to your email - used for NCBI API calls
export CDOT_DATA_DIR=/data/gene_annotation # Change to location

cd ${CDOT_DATA_DIR}
git clone https://github.com/SACGF/cdot  # generation scripts not copied into path with PyPi install
python3 -m pip install .  # install package

${CDOT_DATA_DIR}/cdot/generate_transcript_data/all_transcripts.sh