Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideal configuration file to check protein protein interaction #1011

Closed
Rohit-Satyam opened this issue Sep 9, 2024 · 4 comments
Closed

Ideal configuration file to check protein protein interaction #1011

Rohit-Satyam opened this issue Sep 9, 2024 · 4 comments
Assignees
Labels
community contributions from people outside the haddock team question Further information is requested

Comments

@Rohit-Satyam
Copy link

Dear Developers

I have 13 PDB files obtained from AlphaFold database and we suspect these 13 PDB form a complex. We thought of performing the protein-protein docking to check this using haddock3 but I am unsure of the right configuration to run haddock3 for performing blind docking:

I used the following configuration params:

# ====================================================================
# Protein-protein docking example with NMR-derived ambiguous interaction restraints

# directory in which the scoring will be done
run_dir = "blind_docking"

# execution mode
mode = "local"
# in which queue the jobs should run, if nothing is defined
#  it will take the system's default
# queue = "short"
# concatenate models inside each job, concat = 5 each .job will produce 5 models
concat = 5
#  Limit the number of concurrent submissions to the queue
#queue_limit = 100
#cns_exec = "path/to/bin/cns" # optional

# molecules to be docked
molecules =  [
    "./AF-Q8I521-F1-model_v4.pdb",
    "./AF-Q8II56-F1-model_v4.pdb"
    ]

# ====================================================================
# Parameters for each stage are defined below, prefer full paths
# ====================================================================
[topoaa]
autohis = true

[rigidbody]
tolerance = 5
cmrest = true
sampling = 1000

[seletop]
select = 200

[flexref]
tolerance = 5
contactairs = true

[mdref]
tolerance = 5

[clustfcc]

[seletopclusts]
top_models = 4

But I got the following error:

Starting HADDOCK 3.0.0 on 2024-09-09 20:02:00

Python 3.9.19 | packaged by conda-forge | (main, Mar 20 2024, 12:50:21) 
[GCC 12.3.0]

[2024-09-09 20:02:56,393 libworkflow INFO] Reading instructions step 0_topoaa
[2024-09-09 20:02:56,393 libworkflow INFO] Reading instructions step 1_rigidbody
[2024-09-09 20:02:56,394 libworkflow INFO] Reading instructions step 2_seletop
[2024-09-09 20:02:56,394 libworkflow INFO] Reading instructions step 3_flexref
[2024-09-09 20:02:56,394 libworkflow INFO] Reading instructions step 4_mdref
[2024-09-09 20:02:56,395 libworkflow INFO] Reading instructions step 5_clustfcc
[2024-09-09 20:02:56,395 libworkflow INFO] Reading instructions step 6_seletopclusts
[2024-09-09 20:02:56,425 base_cns_module INFO] Running [topoaa] module
[2024-09-09 20:02:56,425 __init__ INFO] [topoaa] Molecule 1: AF-Q8I521-F1-model_v4.pdb
[2024-09-09 20:02:56,443 __init__ INFO] [topoaa] Sanitizing molecule AF-Q8I521-F1-model_v4_1.pdb
[2024-09-09 20:02:56,505 __init__ INFO] [topoaa] Topology CNS input created
[2024-09-09 20:02:56,505 __init__ INFO] [topoaa] Molecule 2: AF-Q8II56-F1-model_v4.pdb
[2024-09-09 20:02:56,508 __init__ INFO] [topoaa] Sanitizing molecule AF-Q8II56-F1-model_v4_1.pdb
[2024-09-09 20:02:56,521 __init__ INFO] [topoaa] Topology CNS input created
[2024-09-09 20:02:56,521 __init__ INFO] [topoaa] Running CNS Jobs n=2
[2024-09-09 20:02:56,521 libutil INFO] Selected 2 cores to process 2 jobs, with 112 maximum available cores.
[2024-09-09 20:02:56,555 libparallel INFO] Using 2 cores
[2024-09-09 20:04:24,361 libparallel INFO] 2 tasks finished
[2024-09-09 20:04:24,362 __init__ INFO] [topoaa] CNS jobs have finished
[2024-09-09 20:04:24,368 base_cns_module INFO] Module [topoaa] finished.
[2024-09-09 20:04:24,368 __init__ INFO] [topoaa] took 1 minute and 28 seconds
[2024-09-09 20:04:25,616 base_cns_module INFO] Running [rigidbody] module
[2024-09-09 20:04:25,617 __init__ INFO] [rigidbody] crossdock=true
[2024-09-09 20:04:25,618 __init__ INFO] [rigidbody] Preparing jobs...
[2024-09-09 20:04:25,621 libutil INFO] Selected 4 cores to process 1000 jobs, with 112 maximum available cores.
[2024-09-09 20:04:25,622 libparallel INFO] Using 4 cores
[2024-09-09 20:04:25,680 libparallel WARNING] Exception in task execution: Chain/seg IDs are not unique for pdbs ([pdb|2024-09-09 20:04:24] /data/foldseek/af-data/docking_res/blind_docking/0_topoaa/AF-Q8I521-F1-model_v4_1_haddock.pdb, [pdb|2024-09-09 20:04:24] /data/foldseek/af-data/docking_res/blind_docking/0_topoaa/AF-Q8II56-F1-model_v4_1_haddock.pdb).

2024-09-09 20:04:33,502 libparallel INFO] 1000 tasks finished
[2024-09-09 20:04:33,506 __init__ INFO] [rigidbody] Preparation took 7.888738 seconds
[2024-09-09 20:04:33,568 __init__ INFO] [rigidbody] Running CNS Jobs n=1000
[2024-09-09 20:04:33,568 libutil INFO] Selected 4 cores to process 1000 jobs, with 112 maximum available cores.
[2024-09-09 20:04:33,571 libparallel INFO] Using 4 cores
[2024-09-09 20:04:33,577 libparallel WARNING] Exception in task execution: local variable 'error' referenced before assignment
[2024-09-09 20:04:33,577 libparallel WARNING] Exception in task execution: local variable 'error' referenced before assignment
[2024-09-09 20:04:33,578 libparallel WARNING] Exception in task execution: local variable 'error' referenced before assignment

[2024-09-09 20:04:33,621 libparallel INFO] 1000 tasks finished
[2024-09-09 20:04:33,621 __init__ INFO] [rigidbody] CNS jobs have finished
[2024-09-09 20:04:33,709 libutil ERROR] 100.00% of output was not generated for this module and tolerance was set to 5.00%.
Traceback (most recent call last):
  File "/data/foldseek/af-data/haddock3/src/haddock/libs/libutil.py", line 335, in log_error_and_exit
    yield
  File "/data/foldseek/af-data/haddock3/src/haddock/clis/cli.py", line 192, in main
    workflow.run()
  File "/data/foldseek/af-data/haddock3/src/haddock/libs/libworkflow.py", line 43, in run
    step.execute()
  File "/data/foldseek/af-data/haddock3/src/haddock/libs/libworkflow.py", line 162, in execute
    self.module.run()  # type: ignore
  File "/data/foldseek/af-data/haddock3/src/haddock/modules/base_cns_module.py", line 61, in run
    self._run()
  File "/data/foldseek/af-data/haddock3/src/haddock/modules/sampling/rigidbody/__init__.py", line 246, in _run
    self.export_io_models(faulty_tolerance=self.params["tolerance"])
  File "/data/foldseek/af-data/haddock3/src/haddock/modules/__init__.py", line 300, in export_io_models
    self.finish_with_error(_msg)
  File "/data/foldseek/af-data/haddock3/src/haddock/modules/__init__.py", line 308, in finish_with_error
    raise RuntimeError(reason)
RuntimeError: 100.00% of output was not generated for this module and tolerance was set to 5.00%.
[2024-09-09 20:04:33,710 libutil ERROR] 100.00% of output was not generated for this module and tolerance was set to 5.00%.
[2024-09-09 20:04:33,710 libutil ERROR] An error has occurred, see log file. And contact the developers if needed.
[2024-09-09 20:04:33,710 libutil INFO] Finished at 09/09/2024 20:04:33. For any help contact us at https://github.com/haddocking/haddock3/issues. Ciao! 再见! Tot ziens!.

@Rohit-Satyam Rohit-Satyam added the question Further information is requested label Sep 9, 2024
@amjjbonvin
Copy link
Member

amjjbonvin commented Sep 10, 2024 via email

@VGPReys
Copy link
Contributor

VGPReys commented Sep 10, 2024

To add to @amjjbonvin answer:

  • You can use pdb-tools to easily modify the chain ids.
# to modify your chain to X run
pdb_chain  -X path/to/pdb_file.pdb > pdb_file_X.pdb
  • Be sure to remove disordered regions (typical AlphaFold2 spaghettis) around the protein as it may interfere with the docking due to van der Waals clashes (and also during the generation of the center of mass restraints)
  • HADDOCK will struggle with large conformational changes, so you must trust that your input conformations are already bound ones
  • Doing first pair-wise models with ColabFold / AlphaFullDown can leverage this issue
  • Have a look at protein-protein interaction databases, such as BioGRID, IntAct, HuRI for binary interactions, or SLiMAn2 for peptide interactions, as it could give you valuable type of inputs the generate more clever restraints for HADDOCK.
  • The [clustfcc] clustering with default parameter (min_population = 4) will cluster only solution that share a 0.6 fraction of common contacts, which may never happen due to the low sampling and number of input proteins.

@Rohit-Satyam
Copy link
Author

Your two input proteins must have different chainIDs. Also do increase the sampling to 10000 and select then the top400 But to answer this kind of question I would first try to model directly the complex with AlphaFold (or use AlphaPullDown)

On 9 Sep 2024, at 19:10, Rohit Satyam @.***> wrote: [2024-09-09 20:04:25,680 libparallel WARNING] Exception in task execution: Chain/seg IDs are not unique for pdbs ([pdb|2024-09-09 20:04:24] /data/foldseek/af-data/docking_res/blind_docking/0_topoaa/AF-Q8I521-F1-model_v4_1_haddock.pdb, [pdb|2024-09-09 20:04:24] /data/foldseek/af-data/docking_res/blind_docking/0_topoaa/AF-Q8II56-F1-model_v4_1_haddock.pdb). 2024-09-09 20:04:33,502 libparallel INFO] 1000 tasks finished

I checked the Chain IDs and in both PDB files it's A. Since these PDBs are from Alphafold, these are single chain monomer.

  • Be sure to remove disordered regions (typical AlphaFold2 spaghettis) around the protein as it may interfere with the docking due to van der Waals clashes (and also during the generation of the center of mass restraints)

How do I remove the disordered regions without manually selecting residues in Chimera or PyMOL. Is there a way to remove such residues from PDB files in a batch manner?

  • HADDOCK will struggle with large conformational changes, so you must trust that your input conformations are already bound ones

Since these are plasmodium proteins and that too modeled I don't think they are in the "bound" conformation. My plan was to compute all-vs-all docking for the 13 proteins and see if I get something. Since these 13 proteins are not present in any PPI database that's what we are trying to investigate. The wet lab experiments are running as we speak but I thought if we could generate some insights like shortlisting best docked pairs from these 13 proteins. Ofcourse if Haddock works we would like to scale it to several other proteins that we suspect are binding (we know so far that they are coexpressed and some of them are co-localised)

@rvhonorato
Copy link
Member

How do I remove the disordered regions without manually selecting residues in Chimera or PyMOL. Is there a way to remove such residues from PDB files in a batch manner?

The AlphaCutter tool sounds promising - I have never used it tho.

@rvhonorato rvhonorato added the community contributions from people outside the haddock team label Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community contributions from people outside the haddock team question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants