Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
kimrutherford committed Oct 17, 2024
1 parent a7ffa52 commit b9aa00f
Showing 1 changed file with 8 additions and 16 deletions.
24 changes: 8 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,18 @@
# [PomBase](/pombase) code for processing domains
# PomBase code for processing domains from InterProScan

This program processes the `match_complete.xml.gz` from InterPro
and also runs [TMHMM](https://services.healthtech.dtu.dk/services/TMHMM-2.0/)
to generate a JSON of domain information.
This program processes the JSON format output of `InterProScan` to
generate an simplified JSON file of domain information.

The latest InterPro file is available from: https://ftp.ebi.ac.uk/pub/databases/interpro/current_release/

UniProt IDs for pombe proteins are queried from PostgreSQL. Those IDs are
used to filter the InterPro file.

Protein sequences are queried from PostgreSQL and are passed to TMHMM.
We run TMHMM in a separate thread while the InterPro XML is parsed and
processed.
It also runs [TMHMM](https://services.healthtech.dtu.dk/services/TMHMM-2.0/)
and `segmasker` and includes the results in the JSON output.

## Running

Run with:

PATH=$PATH_TO_TMHMM_EXE:$PATH /var/pomcur/bin/pombase-interpro \
-p "postgres://<username>:<password>@localhost/<dbname>" \
-i <(gzip -d < match_complete.xml.gz) -o pombe_domain_results.json

PATH=$PATH_TO_TMHMM_EXE:$PATH /var/pomcur/bin/pombase-domain-process \
-p pombe_peptide.fa -i interproscan_output.json
-o pombe_domain_results.json

## Status

Expand Down

0 comments on commit b9aa00f

Please sign in to comment.