Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkin scripts from slack #3

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
node_modules/
package-lock.json

data/
data/

*ipynb_checkpoints
.snakemake
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,15 @@ Most of these mutations have appeared on the ACE2-binding region on Spike, calle
##
## Installation

Assuming you have cloned the git repository and are in the main repository folder. These steps are also found in the setup.md file.
Assuming you have cloned the git repository and are in the main repository folder.
These steps are also found in the setup.md file.

### Setup a working node js version

One option to do this is to create a conda environment and install nodejs into a new environment

```cmd
conda create -n isbm2021hack nodejs
conda env create -n isbm2021hack -f environment
conda activate isbm2021hack
```

Expand All @@ -58,6 +59,8 @@ npm install three jquery axios querystring
npm install icn3d
```

## Single Scripts

### DelPhi calculation

Calculate the DelPhi potential map using a nodejs script.
Expand All @@ -68,3 +71,13 @@ node bin/delphipot.js [PDB ID] [comma-separated Chain IDs] > data/[PBID]

The DelPhi potential map is now located in the data folder.

## Snakemake Worflow

The [Snakemake](https://snakemake.readthedocs.io) workflow executes all necessary scripts
(in development, tbc)

```
snakemake -c1 -n # dry-run
#snakemake -c1 -n -p # dry-run with rule execution preview
snakemake --cores 3 # execute three jobs in parallel
```
50 changes: 50 additions & 0 deletions Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
"""
Snakemake workflow file
https://snakemake.readthedocs.io/

Run all scripts to produce final results
"""

from pathlib import Path
configfile: 'config.yaml' # access using config['key']


config['DATADIR'] = Path(config['DATADIR'])
config['SCRIPTDIR'] = Path(config['SCRIPTDIR'])
with open(config['DATADIR'] / config['INFILE']) as f:
# format: 2dd8_Ag_S_rbd.pdb
PDB_IDS = []
CHAINS = []
for line in f:
line = line.split('_')
PDB_IDS.append(line[0])
CHAINS.append(line[2])


rule target:
input:
expand(config['DATADIR'] / 'potentials' / "{pdb_id}_{chain}_out.txt",
zip,
pdb_id=PDB_IDS,
chain=CHAINS),
expand(config['DATADIR'] / 'potentials_surface' / "{pdb_id}_{chain}_surface.txt",
zip,
pdb_id=PDB_IDS,
chain=CHAINS)


rule calculate_potential:
output:
config['DATADIR'] / 'potentials' / "{pdb_id}_{chain}_out.txt"
params:
config['SCRIPTDIR'] / 'delphipot.js'
shell:
"node {params} {wildcards.pdb_id} {wildcards.chain} res > {output}"

rule calculate_surface_potential:
output:
config['DATADIR'] / 'potentials_surface' / "{pdb_id}_{chain}_surface.txt"
params:
config['SCRIPTDIR'] / 'delphipot.js'
shell:
"node {params} {wildcards.pdb_id} {wildcards.chain} surface > {output}"
38 changes: 38 additions & 0 deletions bin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@


## Workflow

0. Extract a single Ag chain from Ab-Ag complex using custom workflow
- not provided is an automated way (biopython script/some other tools) to create a
list of `PDB_ID` and `Chain` pairs to store.
```
# current format
# <pdbid>_Ag_<chain>_rbd.pdb # pdbid and chain need to be extracted
2dd8_Ag_S_rbd.pdb # example
```
1. Create a currated list of all PDB files with their side-chain informatin
- see [`data/list_pdb_files`](../data/list_pdb_files)
2. Run `automate_delphical.sh` in the icn3d folder (for potential calculation) on all these files
- after commenting out the code used for surface area calculation in the `delphipot_surface.js` script
> [`delphipot.js`](delphipot.js) needs adaption to accomodate several options
> script has to be adapted. potentially change to snakemake
3. Run `automate_delphical.sh` (for surface area/residue calculation) on all these files (in the icn3d folder)
- after commenting out the code used for electrostatic potential calculation in the `delphipot_surface.js` script
4. Make list of output files generated in step 2 (`input_list_out`)
- having suffix `*_out`
- `out_files`: run script on files in folder `potentials`
5. Make list of output files generated in step 3 (`list_surface_files`)
- having suffix `*_surface`
- `surface_files`: run script on files in folder `potentials_surface`
6. Run `remove_undefined.py` on list from step 4, i.e. `input_list_out`
- i.e. on files in folder `potentials`
7. Make list of output files generated in step 6 (`list_clean_pot`)
- put all into a new cleaned folder
8. Run `map_atom_to_res.py` using lists from step 7
- after keeping output files and pdb files in the same folder
11. make list of csv files generated in step 8
12. keep surface files (filename format `<PDBID>_surface`) and .csv files in a single folder
13. run `group_into_regions.py`
- creates an output file `all_spike_strs_regions_pot.csv`
- only keeps residues 320 to 530
> potentially previous processing could be restricted to region of interest
19 changes: 19 additions & 0 deletions bin/automate_delphical.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/usr/bin/bash

for i in $(cat list_pdb_files);
do

#${A}= "$(cut -d'_' -f1 <<<"$i")"
#echo "$A"


#${B}= "$(cut -d'_' -f4 <<<"$i")"
#echo "$B"

A=$(awk -F_ '{print $1}' <<< ${i})
B=$(awk -F_ '{print $3}' <<< ${i})

echo "delphipot_surface.js ${A} ${B} > ${A}_out"
node delphipot_surface.js ${A} ${B} > ${A}_surface

done
35 changes: 25 additions & 10 deletions bin/delphipot.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,21 @@ let axios = require('axios');
let qs = require('querystring');
let utils = require('./utils.js');
let myArgs = process.argv.slice(2);
if(myArgs.length != 2) {
console.log("Usage: node delphipot.js [PDB ID] [comma-separated Chain IDs]");
if(myArgs.length != 3) {
console.error(
`Usage: node delphipot.js [PDB ID] [comma-separated Chain IDs mode]

Console script to calcualte residual or surface potential

PDB ID: pdb id
chain: selected side-chain(s) comma seperated
mode: res or surface`
);
return;
}
let pdbid = myArgs[0].toUpperCase(); //'6jxr'; //myArgs[0];
let chainArray = myArgs[1].split(',');
let mode = myArgs[2].toLowerCase(); // res or surface
let baseUrlMmdb = "https://www.ncbi.nlm.nih.gov/Structure/mmdb/mmdb_strview.cgi?v=2&program=icn3d&b=1&s=1&ft=1&complexity=2&uid=";
let urlMmdb = baseUrlMmdb + pdbid;
https.get(urlMmdb, function(res1) {
Expand Down Expand Up @@ -77,14 +86,20 @@ https.get(urlMmdb, function(res1) {
resid2pot[resid] += atom.pot;
}
}
console.log("Electrostatic potential: (kt/e)");
for (var resid in resid2pot){
console.log(resid + " : " + resid2pot[resid]);
}
console.log("Solvent accessible surface area: (angstrom square)");
for(var resid in ic.resid2area) {
console.log("resid: " + resid + ' area: ' + ic.resid2area[resid]);
}
if (mode === 'res') {
console.log("Electrostatic potential: (kt/e)");
for (var resid in resid2pot){
console.log(resid + " : " + resid2pot[resid]);
}
} else if (mode === 'surface') {
console.log("Solvent accessible surface area: (angstrom square)");
for(var resid in ic.resid2area) {
console.log("resid: " + resid + ' area: ' + ic.resid2area[resid]);
}
} else {
console.error("Choose one of the two modes: res or surface");
return;
}
})
.catch(function(err) {
utils.dumpError(err);
Expand Down
57 changes: 57 additions & 0 deletions bin/map_atom_to_res.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
#!/usr/bin/env python

"""
Extract residual information from PDB files

Map atoms to residual.
"""
__author__ = 'Mahita Jarapu'

IFH1 = open('list_of_pdb_files', 'r')
lines1 = IFH1.readlines()
for line1 in lines1:
line1 = line1.strip("\n")
pdb_name = line1.split("_")[0].lower()
IFH2 = open(line1, 'r')
res_dict = {}
lines2 = IFH2.readlines()
for i, line2 in enumerate(lines2):
print(line2[0:4])
print(line2[16:20].strip())
print(line2[22:28].strip())
res_dict[i] = line2[22:28].strip()+"_"+line2[16:20].strip()
print(res_dict)
IFH3 = open(pdb_name+"_clean", 'r')
OFH3 = open(pdb_name+"_res_pot.csv", 'w')
lines3 = IFH3.readlines()
residue_pot = 0
residue_pot_dict = {}
for j, line3 in enumerate(lines3):
if j == 0 and float(line3) == 0:

for k in res_dict.keys():
if j == k + 1:
line3 = line3.strip("\n")
residue_pot = residue_pot + float(line3)
# if k < len(res_dict.keys())-1:
if res_dict[k] != res_dict[k + 1]:
res_id = res_dict[k]
if res_id not in residue_pot_dict:
residue_pot_dict[res_id] = residue_pot
residue_pot = 0
else:
for k in res_dict.keys():
if j == k + 1:
line3 = line3.strip("\n")
residue_pot = residue_pot + float(line3)
# if k < len(res_dict.keys())-1:
if res_dict[k] != res_dict[k + 1]:
res_id = res_dict[k]
if res_id not in residue_pot_dict:
residue_pot_dict[res_id] = residue_pot
residue_pot = 0

print(residue_pot_dict)
for res_id in residue_pot_dict.keys():
OFH3.write("%s,%s\n" % (res_id, residue_pot_dict[res_id]))
OFH3.close()
59 changes: 59 additions & 0 deletions bin/remove_undefined.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/env python

"""
Extract residual information from PDB files

Map atoms to residual.
"""
from pathlib import Path
import argparse
__author__ = 'Mahita Jarapu'

OUTFILE_TEMPLATE = '{}_clean'


def get_args():
parser = argparse.ArgumentParser(
'remove undefined potentials from residuals files.')
parser.add_argument(
'-f', '--folder', help='Folder with potential files to clean.')
args = parser.parse_args()
return args


args = get_args()

folder = Path(args.folder)

for _file in folder.iterdir():
delphi_value_dict = {}
outfile = OUTFILE_TEMPLATE.format(_file)
with open(_file) as f, open(outfile, 'w') as OFH2:
header = next(f)
# add check that header is as expected?
for i, line in enumerate(f, start=1):
# example: 7CDJ_E_333 : -35.06
line = line.strip("\n").split(":")
delphi_value = line[1]
delphi_value_dict[i] = delphi_value # change to position in AG?

for line_no in delphi_value_dict.keys():
# print(line_no)
"""What about NaN values

Idea is to
- replace undefined / NaN with 0 if the value before or after is defined?
"""
if delphi_value_dict[line_no] == ' undefined':
if (line_no + 1) < (len(delphi_value_dict.keys())):
if delphi_value_dict[line_no + 1] != ' undefined':
delphi_value_dict[line_no +
1] = delphi_value_dict[line_no + 1].strip()
OFH2.write("%s\n" % (0)) # only write 0 if next is also not undef
elif delphi_value_dict[line_no - 1] != ' undefined':
delphi_value_dict[line_no -
1] = delphi_value_dict[line_no - 1].strip()
OFH2.write("%s\n" % (0))
elif delphi_value_dict[line_no] != ' undefined':
delphi_value_dict[line_no] = delphi_value_dict[line_no].strip()
OFH2.write("%s\n" % (delphi_value_dict[line_no]))
23 changes: 23 additions & 0 deletions bin/utils.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
// https://www.geeksforgeeks.org/how-to-share-code-between-node-js-and-the-browser/
// https://stackoverflow.com/questions/3225251/how-can-i-share-code-between-node-js-and-the-browser
(function(exports) {
// 'use strict';

let dumpError = function(err) {
if (typeof err === 'object') {
if (err.message) {
console.log('\nMessage: ' + err.message)
}
if (err.stack) {
console.log('\nStacktrace:')
console.log('====================')
console.log(err.stack);
}
} else {
console.log('dumpError :: argument is not an object');
}
}

exports.dumpError = dumpError;

})(typeof exports === 'undefined'? this : exports); //this['share']={}: exports);
Loading