This Python script utilizes the Biopython library to perform various analyses on genomic data retrieved from the National Center for Biotechnology Information (NCBI) using its E-utilities. The script covers tasks such as fetching genomic data, reverse complementing sequences, calculating GC skew, visualizing genomic features, and generating files compatible with the UCSC Genome Browser.
- Python 3
- Biopython
- Matplotlib
-
Install the required libraries:
pip install biopython matplotlib
-
Update the
Entrez.email
variable in the script with a valid email address. This is required for using NCBI's E-utilities.
Replace the placeholder accession number ("JX573431.1"
) in the script with the actual accession number you want to analyze.
# Replace 'your_accession_number' with the actual accession number you want to analyze
accession_number = "JX573431.1"
Run the script:
python genomic_analysis_script.py
The script fetches genomic data from NCBI using the provided accession number.
It reverses the retrieved genomic sequence to its complement.
The script calculates and visualizes the GC skew of the reversed sequence.
It extracts and prints information about genes, including their locations and descriptions.
The script extracts gene sequences and stores them for further analysis.
It generates a BED file (gene_locations.bed
) containing information about gene locations.
A track file (trackDb.txt
) is created for use with the UCSC Genome Browser, providing a custom track description.
The script calculates coverage based on genomic features.
It generates a coverage plot and saves it to a file.
The script visualizes gene locations on the genomic sequence and saves the plot to a file.