diff --git a/apps/protein_folding/helixfold3/README.md b/apps/protein_folding/helixfold3/README.md
index 12ed7041..8c810ccd 100644
--- a/apps/protein_folding/helixfold3/README.md
+++ b/apps/protein_folding/helixfold3/README.md
@@ -36,7 +36,7 @@ Those settings are recommended as they are the same as we used in our A100 machi
 ### Installation
 
 HelixFold3 depends on [PaddlePaddle](https://github.com/paddlepaddle/paddle). Python dependencies available through `pip` 
-is provided in `requirements.txt`. `kalign`, the [`HH-suite`](https://github.com/soedinglab/hh-suite) and `jackhmmer` are 
+is provided with `pyproject.toml`. `kalign`, the [`HH-suite`](https://github.com/soedinglab/hh-suite) and `jackhmmer` are 
 also needed to produce multiple sequence alignments. The download scripts require `aria2c`. 
 
 Locate to the directory of `helixfold` then run:
@@ -44,17 +44,26 @@ Locate to the directory of `helixfold` then run:
 ```bash
 # Install py env
 conda create -n helixfold -c conda-forge python=3.9
-conda install -y -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 -n helixfold
-conda install -y -c conda-forge openbabel -n helixfold
 
 # activate the conda environment
 conda activate helixfold
 
+# adjust these version numbers as your situation
+conda install -y cudnn=8.4.1 cudatoolkit=11.7 nccl=2.14.3 -c conda-forge -c nvidia
+conda install -y -c bioconda aria2 hmmer==3.3.2 kalign2==2.04 hhsuite==3.3.0 
+conda install -y -c conda-forge openbabel
+
 # install paddlepaddle
-python3 -m pip install paddlepaddle-gpu==2.6.1.post120 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+pip install paddlepaddle-gpu==2.6.1.post120 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
 # or lower version: https://paddle-wheel.bj.bcebos.com/2.5.1/linux/linux-gpu-cuda11.7-cudnn8.4.1-mkl-gcc8.2-avx/paddlepaddle_gpu-2.5.1.post117-cp39-cp39-linux_x86_64.whl
 
-python3 -m pip install -r requirements.txt
+# downgrade pip
+pip install --upgrade 'pip<24'
+
+# edit configuration file at `./helixfold/config/helixfold.yaml` to set your databases and binaries correctly.
+
+# install HF3 as a python library
+pip install .  --no-cache-dir
 ```
 
 Note: If you have a different version of python3 and cuda, please refer to [here](https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html) for the compatible PaddlePaddle `dev` package.
@@ -63,14 +72,22 @@ Note: If you have a different version of python3 and cuda, please refer to [here
 #### Install Maxit
 The conversion between `.cif` and `.pdb` relies on [Maxit](https://sw-tools.rcsb.org/apps/MAXIT/index.html). 
 Download Maxit source code from https://sw-tools.rcsb.org/apps/MAXIT/maxit-v11.100-prod-src.tar.gz. Untar and follow 
-its `README` to complete installation. 
+its `README` to complete installation. If you encouter error like your GCC version not support (9.4.0, for example), editing `etc/platform.sh` and reruning compilation again would make sense. See below:
+
+```bash
+#   Check if it is a Linux platform
+    Linux)
+#     Check if it is GCC version 4.x
+      gcc_ver=`gcc --version | grep -e " 4\."` # edit `4\.` to `9\.`
+      if [[ -z $gcc_ver ]]
+```
 
 ### Usage
 
 In order to run HelixFold3, the genetic databases and model parameters are required.
 
 The parameters of HelixFold3 can be downloaded [here](https://paddlehelix.bd.bcebos.com/HelixFold3/params/HelixFold3-params-240814.zip), 
-please place the downloaded checkpoint in ```./init_models/ ```directory.
+please place the downloaded checkpoint path in `weight_path` of `helixfold/config/helixfold.yaml` configuration file before install HF3 as a python module.
 
 The script `scripts/download_all_data.sh` can be used to download and set up all genetic databases with the following configs:
 
@@ -96,10 +113,11 @@ The script `scripts/download_all_data.sh` can be used to download and set up all
 
 There are some demo input under `./data/` for your test and reference. Data input is in the form of JSON containing
 several entities such as `protein`, `ligand`, `nucleic acids`, and `iron`. Proteins and nucleic acids inputs are their sequence.
-HelixFold3 supports input ligand as SMILES or CCD id, please refer to `/data/demo_6zcy_smiles.json` and `demo_output/demo_6zcy_smiles/` 
-for more details about SMILES input. More flexible input will come in soon.
+HelixFold3 supports input ligand as SMILES, CCD id or small molecule files, please refer to `/data/demo_6zcy_smiles.json` and `data/demo_p450_heme_sdf.json` 
+for more details about SMILES input. Flexible input from small molecule is now supported. See `obabel -L formats |grep -v 'Write-only'`
 
 A example of input data is as follows:
+
 ```json
 {
     "entities": [
@@ -117,66 +135,182 @@ A example of input data is as follows:
 }
 ```
 
+Another example of **covalently modified** input:
+
+```json
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "MDALYKSTVAKFNEVIQLDCSTEFFSIALSSIAGILLLLLLFRSKRHSSLKLPPGKLGIPFIGESFIFLRALRSNSLEQFFDERVKKFGLVFKTSLIGHPTVVLCGPAGNRLILSNEEKLVQMSWPAQFMKLMGENSVATRRGEDHIVMRSALAGFFGPGALQSYIGKMNTEIQSHINEKWKGKDEVNVLPLVRELVFNISAILFFNIYDKQEQDRLHKLLETILVGSFALPIDLPGFGFHRALQGRAKLNKIMLSLIKKRKEDLQSGSATATQDLLSVLLTFRDDKGTPLTNDEILDNFSSLLHASYDTTTSPMALIFKLLSSNPECYQKVVQEQLEILSNKEEGEEITWKDLKAMKYTWQVAQETLRMFPPVFGTFRKAITDIQYDGYTIPKGWKLLWTTYSTHPKDLYFNEPEKFMPSRFDQEGKHVAPYTFLPFGGGQRSCVGWEFSKMEILLFVHHFVKTFSSYTPVDPDEKISGDPLPPLPSKGFSIKLFPRP",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "ccd": "HEM",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "smiles": "CC1=C2CC[C@@]3(CCCC(=C)[C@H]3C[C@@H](C2(C)C)CC1)C",
+            "count": 1
+        },
+        {
+            "type": "bond",
+            "bond": "A,CYS,445,SG,B,HEM,1,FE,covale,2.3",
+            "_comment": "<chain-id>,<residue name>,<residue index>,<atom id>,<chain-id>,<residue name>,<residue index>,<atom id>,<bond type>,<bond length>",
+            "_another_comment": "use semicolon to separate multiple bonds",
+            "_also_comment": "For ccd input, use CCD key as residue name; for smiles and file input, use `UNK-<index>` where index is the chain order you input. eg. `UNK-1` for the first ligand chain(or the count #1), `UNK-2` the second(or the count #2)."
+        }
+    ]
+}
+```
+
+For seaking all atom ids in CCD database:
+
+```shell
+helixfold_show_ccd +ccd_id=HEM
+```
+
+This command outputs like:
+
+```text
+# output:
+[2024-08-23 22:44:36,324][absl][INFO] - Started Loading CCD dataset from /mnt/db/ccd/ccd_preprocessed_etkdg.pkl.gz
+[2024-08-23 22:44:43,236][absl][INFO] - Finished Loading CCD dataset from /mnt/db/ccd/ccd_preprocessed_etkdg.pkl.gz in 6.912 seconds
+[2024-08-23 22:44:43,237][absl][INFO] - CCD dataset contains 43488 entries.
+[2024-08-23 22:44:43,237][absl][INFO] - Atoms in HEM: ['CHA', 'CHB', 'CHC', 'CHD', 'C1A', 'C2A', 'C3A', 'C4A', 'CMA', 'CAA', 'CBA', 'CGA', 'O1A', 'O2A', 'C1B', 'C2B', 'C3B', 'C4B', 'CMB', 'CAB', 'CBB', 'C1C', 'C2C', 'C3C', 'C4C', 'CMC', 'CAC', 'CBC', 'C1D', 'C2D', 'C3D', 'C4D', 'CMD', 'CAD', 'CBD', 'CGD', 'O1D', 'O2D', 'NA', 'NB', 'NC', 'ND', 'FE']
+```
+
+For seaking all atom ids in a given `sdf`/`mol2`, the atom list follows the same order in its file.
+
+HF3 parsed:
+
+```text
+['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'N1', 'O1', 'O2', 'O3', 'O4', 'O5', 'C9', 'C10', 'C11', 'C12', 'C13', 'C14', 'C15', 'C16', 'N2', 'O6', 'O7', 'O8', 'O9', 'O10', 'C17', 'C18', 'C19', 'C20', 'C21', 'C22', 'O11', 'O12', 'O13', 'O14', 'O15', 'C23', 'C24', 'C25', 'C26', 'C27', 'C28', 'O16', 'O17', 'O18', 'O19', 'O20', 'C29', 'C30', 'C31', 'C32', 'C33', 'C34', 'O21', 'O22', 'O23', 'O24', 'O25', 'C35', 'C36', 'C37', 'C38', 'C39', 'C40', 'O26', 'O27', 'O28', 'O29', 'O30']
+```
+
+while in `SDF`:
+
+```text
+   29.7340    3.2540   76.7430 C   0  0  0  0  0  2  0  0  0  0  0  0
+   29.8160    4.4760   77.6460 C   0  0  1  0  0  3  0  0  0  0  0  0
+   28.5260    5.2840   77.5530 C   0  0  2  0  0  3  0  0  0  0  0  0
+   28.1780    5.5830   76.1020 C   0  0  1  0  0  3  0  0  0  0  0  0
+   28.2350    4.3240   75.2420 C   0  0  1  0  0  3  0  0  0  0  0  0
+   28.1040    4.6170   73.7650 C   0  0  0  0  0  2  0  0  0  0  0  0
+   31.3020    3.8250   79.4830 C   0  0  0  0  0  0  0  0  0  0  0  0
+   31.3910    3.4410   80.9280 C   0  0  0  0  0  1  0  0  0  0  0  0
+   30.0760    4.0880   79.0210 N   0  0  0  0  0  2  0  0  0  0  0  0
+   28.6870    6.5050   78.2670 O   0  0  0  0  0  1  0  0  0  0  0  0
+   26.8490    6.0910   76.0350 O   0  0  0  0  0  0  0  0  0  0  0  0
+   29.4950    3.6650   75.4130 O   0  0  0  0  0  0  0  0  0  0  0  0
+   29.3670    4.5550   73.1150 O   0  0  0  0  0  1  0  0  0  0  0  0
+   32.2950    3.8940   78.7640 O   0  0  0  0  0  0  0  0  0  0  0  0
+   26.7420    7.4140   75.6950 C   0  0  1  0  0  3  0  0  0  0  0  0
+   25.2700    7.7830   75.6110 C   0  0  1  0  0  3  0  0  0  0  0  0
+   25.1290    9.2300   75.1610 C   0  0  2  0  0  3  0  0  0  0  0  0
+   25.9180   10.1440   76.0880 C   0  0  1  0  0  3  0  0  0  0  0  0
+   27.3630    9.6720   76.2210 C   0  0  1  0  0  3  0  0  0  0  0  0
+   28.1310   10.4360   77.2730 C   0  0  0  0  0  2  0  0  0  0  0  0
+   23.8820    5.8170   75.1400 C   0  0  0  0  0  0  0  0  0  0  0  0
+   23.1980    5.0100   74.0810 C   0  0  0  0  0  1  0  0  0  0  0  0
+   24.5530    6.8930   74.7160 N   0  0  0  0  0  2  0  0  0  0  0  0
+   23.7530    9.5950   75.1670 O   0  0  0  0  0  1  0  0  0  0  0  0
+   25.9170   11.4700   75.5730 O   0  0  0  0  0  0  0  0  0  0  0  0
+   27.4050    8.2900   76.6040 O   0  0  0  0  0  0  0  0  0  0  0  0
+   29.5300   10.4030   77.0280 O   0  0  0  0  0  1  0  0  0  0  0  0
+   23.8300    5.5110   76.3290 O   0  0  0  0  0  0  0  0  0  0  0  0
+   25.3940   12.4250   76.4090 C   0  0  1  0  0  3  0  0  0  0  0  0
+   25.9490   13.7680   75.9090 C   0  0  2  0  0  3  0  0  0  0  0  0
+   25.1320   14.9560   76.4900 C   0  0  2  0  0  3  0  0  0  0  0  0
+   23.6130   14.6900   76.6390 C   0  0  1  0  0  3  0  0  0  0  0  0
+   23.3700   13.3000   77.2280 C   0  0  1  0  0  3  0  0  0  0  0  0
+   21.9020   12.9360   77.3500 C   0  0  0  0  0  2  0  0  0  0  0  0
+   25.9010   13.8490   74.4810 O   0  0  0  0  0  1  0  0  0  0  0  0
+   25.3420   16.1410   75.7110 O   0  0  0  0  0  0  0  0  0  0  0  0
+   23.0420   15.6520   77.5170 O   0  0  0  0  0  1  0  0  0  0  0  0
+   23.9910   12.3690   76.3570 O   0  0  0  0  0  0  0  0  0  0  0  0
+   21.3660   12.8480   76.0500 O   0  0  0  0  0  0  0  0  0  0  0  0
+   20.8090   11.6500   75.6780 C   0  0  2  0  0  3  0  0  0  0  0  0
+   20.6800   11.6410   74.1740 C   0  0  2  0  0  3  0  0  0  0  0  0
+   19.5510   12.5850   73.8180 C   0  0  2  0  0  3  0  0  0  0  0  0
+   18.2370   12.0940   74.4540 C   0  0  1  0  0  3  0  0  0  0  0  0
+   18.4030   11.9240   75.9810 C   0  0  1  0  0  3  0  0  0  0  0  0
+   17.2710   11.1260   76.6120 C   0  0  0  0  0  2  0  0  0  0  0  0
+   20.2900   10.3510   73.7080 O   0  0  0  0  0  1  0  0  0  0  0  0
+   19.4280   12.7380   72.4110 O   0  0  0  0  0  0  0  0  0  0  0  0
+   17.2120   13.0460   74.2030 O   0  0  0  0  0  1  0  0  0  0  0  0
+   19.6260   11.2000   76.3010 O   0  0  0  0  0  0  0  0  0  0  0  0
+   16.0670   11.4490   75.9360 O   0  0  0  0  0  1  0  0  0  0  0  0
+   20.2190   13.6280   71.7260 C   0  0  2  0  0  3  0  0  0  0  0  0
+   19.6090   14.0000   70.3810 C   0  0  2  0  0  3  0  0  0  0  0  0
+   19.6360   12.7820   69.4880 C   0  0  2  0  0  3  0  0  0  0  0  0
+   21.0860   12.3100   69.3240 C   0  0  1  0  0  3  0  0  0  0  0  0
+   21.7030   12.0240   70.7120 C   0  0  1  0  0  3  0  0  0  0  0  0
+   23.1940   11.7460   70.6620 C   0  0  0  0  0  2  0  0  0  0  0  0
+   20.4080   14.9810   69.7000 O   0  0  0  0  0  1  0  0  0  0  0  0
+   19.0310   13.0500   68.2340 O   0  0  0  0  0  1  0  0  0  0  0  0
+   21.1060   11.1280   68.5380 O   0  0  0  0  0  1  0  0  0  0  0  0
+   21.5380   13.1700   71.5840 O   0  0  0  0  0  0  0  0  0  0  0  0
+   23.8240   12.5210   71.6820 O   0  0  0  0  0  1  0  0  0  0  0  0
+   26.0070   17.3020   76.0200 C   0  0  2  0  0  3  0  0  0  0  0  0
+   27.0750   17.5250   74.9350 C   0  0  2  0  0  3  0  0  0  0  0  0
+   28.3660   16.8320   75.3290 C   0  0  2  0  0  3  0  0  0  0  0  0
+   28.7820   17.2470   76.7510 C   0  0  1  0  0  3  0  0  0  0  0  0
+   27.6930   16.8120   77.7320 C   0  0  1  0  0  3  0  0  0  0  0  0
+   27.9770   17.2020   79.1710 C   0  0  0  0  0  2  0  0  0  0  0  0
+   27.3990   18.9140   74.8010 O   0  0  0  0  0  1  0  0  0  0  0  0
+   29.4060   17.0990   74.3950 O   0  0  0  0  0  1  0  0  0  0  0  0
+   30.0160   16.6410   77.0930 O   0  0  0  0  0  1  0  0  0  0  0  0
+   26.4610   17.4820   77.3520 O   0  0  0  0  0  0  0  0  0  0  0  0
+   27.3660   18.4620   79.4040 O   0  0  0  0  0  1  0  0  0  0  0  0
+```
+
 #### Running HelixFold for Inference
+
 To run inference on a sequence or multiple sequences using HelixFold3's pretrained parameters, run e.g.:
-* Inference on single GPU (change the settings in script BEFORE you run it)
+
+##### Run from default config
+
+```shell
+LD_LIBRARY_PATH=$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH \
+helixfold \
+    input=./data/demo_8ecx.json \
+    output=. \
+    CONFIG_DIFFS.preset=allatom_demo
 ```
-sh run_infer.sh
+
+##### Run with customized configuration dir and file(`./myfold.yaml`, for example):
+
+```shell
+LD_LIBRARY_PATH=$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH \
+helixfold --config-dir=. --config-name=myfold \
+    input=./data/demo_6zcy_smiles.json \
+    output=. \
+    CONFIG_DIFFS.preset=allatom_demo
 ```
 
-The script is as follows,
-```bash
-#!/bin/bash
-
-PYTHON_BIN="PATH/TO/YOUR/PYTHON"
-ENV_BIN="PATH/TO/YOUR/ENV"
-MAXIT_SRC="PATH/TO/MAXIT/SRC"
-DATA_DIR="PATH/TO/DATA"
-export OBABEL_BIN="PATH/TO/OBABEL/BIN"
-export PATH="$MAXIT_BIN/bin:$PATH"
-
-CUDA_VISIBLE_DEVICES=0 "$PYTHON_BIN" inference.py \
-    --maxit_binary "$MAXIT_SRC/bin/maxit" \
-    --jackhmmer_binary_path "$ENV_BIN/jackhmmer" \
-	--hhblits_binary_path "$ENV_BIN/hhblits" \
-	--hhsearch_binary_path "$ENV_BIN/hhsearch" \
-	--kalign_binary_path "$ENV_BIN/kalign" \
-	--hmmsearch_binary_path "$ENV_BIN/hmmsearch" \
-	--hmmbuild_binary_path "$ENV_BIN/hmmbuild" \
-    --nhmmer_binary_path "$ENV_BIN/nhmmer" \
-    --preset='reduced_dbs' \
-    --bfd_database_path "$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt" \
-    --small_bfd_database_path "$DATA_DIR/small_bfd/bfd-first_non_consensus_sequences.fasta" \
-    --bfd_database_path "$DATA_DIR/small_bfd/bfd-first_non_consensus_sequences.fasta" \
-    --uniclust30_database_path "$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08" \
-    --uniprot_database_path "$DATA_DIR/uniprot/uniprot.fasta" \
-    --pdb_seqres_database_path "$DATA_DIR/pdb_seqres/pdb_seqres.txt" \
-    --uniref90_database_path "$DATA_DIR/uniref90/uniref90.fasta" \
-    --mgnify_database_path "$DATA_DIR/mgnify/mgy_clusters_2018_12.fa" \
-    --template_mmcif_dir "$DATA_DIR/pdb_mmcif/mmcif_files" \
-    --obsolete_pdbs_path "$DATA_DIR/pdb_mmcif/obsolete.dat" \
-    --ccd_preprocessed_path "$DATA_DIR/ccd_preprocessed_etkdg.pkl.gz" \
-    --rfam_database_path "$DATA_DIR/Rfam-14.9_rep_seq.fasta" \
-    --max_template_date=2020-05-14 \
-    --input_json data/demo_protein_ligand.json \
-    --output_dir ./output \
-    --model_name allatom_demo \
-    --init_model ./init_models/checkpoints.pdparams \
-    --infer_times 3 \
-    --precision "fp32"
+##### Run with additional configuration term 
+
+```shell
+LD_LIBRARY_PATH=$CONDA_PREFIX/lib/:$LD_LIBRARY_PATH \
+helixfold \
+    input=./data/demo_6zcy.json \
+    output=. \
+    CONFIG_DIFFS.preset=allatom_demo \
+    +CONFIG_DIFFS.model.global_config.subbatch_size=192 \
+    +CONFIG_DIFFS.model.num_recycle=10
 ```
+
 The descriptions of the above script are as follows:
-* Replace `MAXIT_SRC` with your installed `maxit`'s root path.
-* Replace `DATA_DIR` with your downloaded data path.
-* Replace `OBABEL_BIN` with your installed `openbabel` path.
-* Replace `ENV_BIN` with your conda virtual environment or any environment where `hhblits`, `hmmsearch` and other dependencies have been installed.
-* `--preset` - Set `'reduced_dbs'` to use small bfd or `'full_dbs'` to use full bfd.
-* `--*_database_path` - Path to datasets you have downloaded.
-* `--input_json` - Input data in the form of JSON. Input pattern in `./data/demo_*.json` for your reference.
-* `--output_dir` - Model output path. The output will be in a folder named the same as your `--input_json` under this path.
-* `--model_name` - Model name in `./helixfold/model/config.py`. Different model names specify different configurations. Mirro modification to configuration can be specified in `CONFIG_DIFFS` in the `config.py` without change to the full configuration in `CONFIG_ALLATOM`.
-* `--infer_time` - The number of inferences executed by model for single input. In each inference, the model will infer `5` times (`diff_batch_size`) for the same input by default. This hyperparameter can be changed by `model.head.diffusion_module.test_diff_batch_size` within `./helixfold/model/config.py`
-* `--precision` - Either `bf16` or `fp32`. Please check if your machine can support `bf16` or not beforing changing it. For example, `bf16` is supported by A100 and H100 or higher version while V100 only supports `fp32`.
+* `LD_LIBRARY_PATH` - This is required to load the `libcudnn.so` library if you encounter issue like `RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.`
+* `config-dir` - The directory that contains the alterative configuration file you would like to use.
+* `config-name` - The name of the configuration file you would like to use.
+* `input` - Input data in the form of JSON. Input pattern in `./data/demo_*.json` for your reference.
+* `output` - Model output path. The output will be in a folder named the same as your `--input_json` under this path.
+* `CONFIG_DIFFS.preset` - Adjusted model config preset name in `./helixfold/model/config.py:CONFIG_DIFFS`. The preset will be updated into final model configuration with `CONFIG_ALLATOM`.
+* `CONFIG_DIFFS.*` - Override model any configuration in `CONFIG_ALLATOM`.
 
 ### Understanding Model Output
 
@@ -184,7 +318,7 @@ The outputs will be in a subfolder of `output_dir`, including the computed MSAs,
 ranked structures, and evaluation metrics. For a task of inferring twice with diffusion batch size 3, 
 assume your input JSON is named `demo_data.json`, the `output_dir` directory will have the following structure:
 
-```
+```text
 <output_dir>/
 └── demo_data/
     ├── demo_data-pred-1-1/
@@ -208,9 +342,10 @@ assume your input JSON is named `demo_data.json`, the `output_dir` directory wil
         └── ...
 
 ```
+
 The contents of each output file are as follows:
 * `final_features.pkl` – A `pickle` file containing the input feature NumPy arrays
- used by the models to predict the structures.
+ used by the models to predict the structures. If you need to re-run a inference without re-building the MSAs, delete this file.
 * `msas/` - A directory containing the files describing the various genetic
  tool hits that were used to construct the input MSA.
 * `demo_data-pred-X-Y` - Prediction results of `demo_data.json` in X-th inference and Y-thdiffusion batch, 
@@ -224,8 +359,7 @@ We suggest a single GPU for inference has at least 32G available memory. The max
 single V100-32G with precision `fp32` is up to 1000. Inferring longer tokens or entities with larger atom numbers 
 per token than normal protein residues like nucleic acids may cost more GPU memory.
 
-For samples with larger tokens, you can reduce `model.global_config.subbatch_size` in `CONFIG_DIFFS` in `helixfold/model/config.py` to save more GPU memory but suffer from slower inference. `model.global_config.subbatch_size` is set as `96` by default. You can also
-reduce the number of additional recycles by changing `model.num_recycle` in the same place.
+For samples with larger tokens, you can override `model.global_config.subbatch_size` in `CONFIG_ALLATOM` by using `+CONFIG_DIFFS.model.global_config.subbatch_size=X` on command runs, where `X` is a smaller number than `96`, to save more GPU memory although this will cause a slower inference. Additionally, you can reduce the number of additional recycles by setting `+CONFIG_DIFFS.model.num_recycle=Y`, where `Y` is a smaller number than `3`.
 
 
 We are keen on support longer token inference, it will come in soon.
diff --git a/apps/protein_folding/helixfold3/data/7s69_glycan.sdf b/apps/protein_folding/helixfold3/data/7s69_glycan.sdf
new file mode 100644
index 00000000..8ac6a9e0
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/7s69_glycan.sdf
@@ -0,0 +1,155 @@
+
+ OpenBabel03042416223D
+
+ 72 77  0  0  1  0  0  0  0  0999 V2000
+   29.7340    3.2540   76.7430 C   0  0  0  0  0  2  0  0  0  0  0  0
+   29.8160    4.4760   77.6460 C   0  0  1  0  0  3  0  0  0  0  0  0
+   28.5260    5.2840   77.5530 C   0  0  2  0  0  3  0  0  0  0  0  0
+   28.1780    5.5830   76.1020 C   0  0  1  0  0  3  0  0  0  0  0  0
+   28.2350    4.3240   75.2420 C   0  0  1  0  0  3  0  0  0  0  0  0
+   28.1040    4.6170   73.7650 C   0  0  0  0  0  2  0  0  0  0  0  0
+   31.3020    3.8250   79.4830 C   0  0  0  0  0  0  0  0  0  0  0  0
+   31.3910    3.4410   80.9280 C   0  0  0  0  0  1  0  0  0  0  0  0
+   30.0760    4.0880   79.0210 N   0  0  0  0  0  2  0  0  0  0  0  0
+   28.6870    6.5050   78.2670 O   0  0  0  0  0  1  0  0  0  0  0  0
+   26.8490    6.0910   76.0350 O   0  0  0  0  0  0  0  0  0  0  0  0
+   29.4950    3.6650   75.4130 O   0  0  0  0  0  0  0  0  0  0  0  0
+   29.3670    4.5550   73.1150 O   0  0  0  0  0  1  0  0  0  0  0  0
+   32.2950    3.8940   78.7640 O   0  0  0  0  0  0  0  0  0  0  0  0
+   26.7420    7.4140   75.6950 C   0  0  1  0  0  3  0  0  0  0  0  0
+   25.2700    7.7830   75.6110 C   0  0  1  0  0  3  0  0  0  0  0  0
+   25.1290    9.2300   75.1610 C   0  0  2  0  0  3  0  0  0  0  0  0
+   25.9180   10.1440   76.0880 C   0  0  1  0  0  3  0  0  0  0  0  0
+   27.3630    9.6720   76.2210 C   0  0  1  0  0  3  0  0  0  0  0  0
+   28.1310   10.4360   77.2730 C   0  0  0  0  0  2  0  0  0  0  0  0
+   23.8820    5.8170   75.1400 C   0  0  0  0  0  0  0  0  0  0  0  0
+   23.1980    5.0100   74.0810 C   0  0  0  0  0  1  0  0  0  0  0  0
+   24.5530    6.8930   74.7160 N   0  0  0  0  0  2  0  0  0  0  0  0
+   23.7530    9.5950   75.1670 O   0  0  0  0  0  1  0  0  0  0  0  0
+   25.9170   11.4700   75.5730 O   0  0  0  0  0  0  0  0  0  0  0  0
+   27.4050    8.2900   76.6040 O   0  0  0  0  0  0  0  0  0  0  0  0
+   29.5300   10.4030   77.0280 O   0  0  0  0  0  1  0  0  0  0  0  0
+   23.8300    5.5110   76.3290 O   0  0  0  0  0  0  0  0  0  0  0  0
+   25.3940   12.4250   76.4090 C   0  0  1  0  0  3  0  0  0  0  0  0
+   25.9490   13.7680   75.9090 C   0  0  2  0  0  3  0  0  0  0  0  0
+   25.1320   14.9560   76.4900 C   0  0  2  0  0  3  0  0  0  0  0  0
+   23.6130   14.6900   76.6390 C   0  0  1  0  0  3  0  0  0  0  0  0
+   23.3700   13.3000   77.2280 C   0  0  1  0  0  3  0  0  0  0  0  0
+   21.9020   12.9360   77.3500 C   0  0  0  0  0  2  0  0  0  0  0  0
+   25.9010   13.8490   74.4810 O   0  0  0  0  0  1  0  0  0  0  0  0
+   25.3420   16.1410   75.7110 O   0  0  0  0  0  0  0  0  0  0  0  0
+   23.0420   15.6520   77.5170 O   0  0  0  0  0  1  0  0  0  0  0  0
+   23.9910   12.3690   76.3570 O   0  0  0  0  0  0  0  0  0  0  0  0
+   21.3660   12.8480   76.0500 O   0  0  0  0  0  0  0  0  0  0  0  0
+   20.8090   11.6500   75.6780 C   0  0  2  0  0  3  0  0  0  0  0  0
+   20.6800   11.6410   74.1740 C   0  0  2  0  0  3  0  0  0  0  0  0
+   19.5510   12.5850   73.8180 C   0  0  2  0  0  3  0  0  0  0  0  0
+   18.2370   12.0940   74.4540 C   0  0  1  0  0  3  0  0  0  0  0  0
+   18.4030   11.9240   75.9810 C   0  0  1  0  0  3  0  0  0  0  0  0
+   17.2710   11.1260   76.6120 C   0  0  0  0  0  2  0  0  0  0  0  0
+   20.2900   10.3510   73.7080 O   0  0  0  0  0  1  0  0  0  0  0  0
+   19.4280   12.7380   72.4110 O   0  0  0  0  0  0  0  0  0  0  0  0
+   17.2120   13.0460   74.2030 O   0  0  0  0  0  1  0  0  0  0  0  0
+   19.6260   11.2000   76.3010 O   0  0  0  0  0  0  0  0  0  0  0  0
+   16.0670   11.4490   75.9360 O   0  0  0  0  0  1  0  0  0  0  0  0
+   20.2190   13.6280   71.7260 C   0  0  2  0  0  3  0  0  0  0  0  0
+   19.6090   14.0000   70.3810 C   0  0  2  0  0  3  0  0  0  0  0  0
+   19.6360   12.7820   69.4880 C   0  0  2  0  0  3  0  0  0  0  0  0
+   21.0860   12.3100   69.3240 C   0  0  1  0  0  3  0  0  0  0  0  0
+   21.7030   12.0240   70.7120 C   0  0  1  0  0  3  0  0  0  0  0  0
+   23.1940   11.7460   70.6620 C   0  0  0  0  0  2  0  0  0  0  0  0
+   20.4080   14.9810   69.7000 O   0  0  0  0  0  1  0  0  0  0  0  0
+   19.0310   13.0500   68.2340 O   0  0  0  0  0  1  0  0  0  0  0  0
+   21.1060   11.1280   68.5380 O   0  0  0  0  0  1  0  0  0  0  0  0
+   21.5380   13.1700   71.5840 O   0  0  0  0  0  0  0  0  0  0  0  0
+   23.8240   12.5210   71.6820 O   0  0  0  0  0  1  0  0  0  0  0  0
+   26.0070   17.3020   76.0200 C   0  0  2  0  0  3  0  0  0  0  0  0
+   27.0750   17.5250   74.9350 C   0  0  2  0  0  3  0  0  0  0  0  0
+   28.3660   16.8320   75.3290 C   0  0  2  0  0  3  0  0  0  0  0  0
+   28.7820   17.2470   76.7510 C   0  0  1  0  0  3  0  0  0  0  0  0
+   27.6930   16.8120   77.7320 C   0  0  1  0  0  3  0  0  0  0  0  0
+   27.9770   17.2020   79.1710 C   0  0  0  0  0  2  0  0  0  0  0  0
+   27.3990   18.9140   74.8010 O   0  0  0  0  0  1  0  0  0  0  0  0
+   29.4060   17.0990   74.3950 O   0  0  0  0  0  1  0  0  0  0  0  0
+   30.0160   16.6410   77.0930 O   0  0  0  0  0  1  0  0  0  0  0  0
+   26.4610   17.4820   77.3520 O   0  0  0  0  0  0  0  0  0  0  0  0
+   27.3660   18.4620   79.4040 O   0  0  0  0  0  1  0  0  0  0  0  0
+  1  2  1  0  0  0  0
+  1 12  1  0  0  0  0
+  2  3  1  0  0  0  0
+  2  9  1  1  0  0  0
+  3 10  1  1  0  0  0
+  3  4  1  0  0  0  0
+  4  5  1  0  0  0  0
+  4 11  1  1  0  0  0
+  5  6  1  6  0  0  0
+  5 12  1  0  0  0  0
+  6 13  1  0  0  0  0
+  7 14  2  0  0  0  0
+  7  8  1  0  0  0  0
+  7  9  1  0  0  0  0
+ 15 16  1  0  0  0  0
+ 15 11  1  1  0  0  0
+ 15 26  1  0  0  0  0
+ 16 23  1  6  0  0  0
+ 16 17  1  0  0  0  0
+ 17 18  1  0  0  0  0
+ 17 24  1  1  0  0  0
+ 18 25  1  6  0  0  0
+ 18 19  1  0  0  0  0
+ 19 20  1  1  0  0  0
+ 19 26  1  0  0  0  0
+ 20 27  1  0  0  0  0
+ 21 22  1  0  0  0  0
+ 21 23  1  0  0  0  0
+ 21 28  2  0  0  0  0
+ 29 38  1  0  0  0  0
+ 29 25  1  6  0  0  0
+ 29 30  1  0  0  0  0
+ 30 35  1  6  0  0  0
+ 30 31  1  0  0  0  0
+ 31 32  1  0  0  0  0
+ 31 36  1  6  0  0  0
+ 32 33  1  0  0  0  0
+ 32 37  1  1  0  0  0
+ 33 38  1  0  0  0  0
+ 33 34  1  6  0  0  0
+ 34 39  1  0  0  0  0
+ 40 49  1  0  0  0  0
+ 40 41  1  0  0  0  0
+ 40 39  1  1  0  0  0
+ 41 46  1  1  0  0  0
+ 41 42  1  0  0  0  0
+ 42 43  1  0  0  0  0
+ 42 47  1  6  0  0  0
+ 43 48  1  1  0  0  0
+ 43 44  1  0  0  0  0
+ 44 49  1  0  0  0  0
+ 44 45  1  6  0  0  0
+ 45 50  1  0  0  0  0
+ 51 47  1  6  0  0  0
+ 51 60  1  0  0  0  0
+ 51 52  1  0  0  0  0
+ 52 53  1  0  0  0  0
+ 52 57  1  6  0  0  0
+ 53 54  1  0  0  0  0
+ 53 58  1  6  0  0  0
+ 54 59  1  6  0  0  0
+ 54 55  1  0  0  0  0
+ 55 56  1  6  0  0  0
+ 55 60  1  0  0  0  0
+ 56 61  1  0  0  0  0
+ 62 71  1  0  0  0  0
+ 62 36  1  1  0  0  0
+ 62 63  1  0  0  0  0
+ 63 68  1  1  0  0  0
+ 63 64  1  0  0  0  0
+ 64 69  1  6  0  0  0
+ 64 65  1  0  0  0  0
+ 65 70  1  1  0  0  0
+ 65 66  1  0  0  0  0
+ 66 67  1  1  0  0  0
+ 66 71  1  0  0  0  0
+ 67 72  1  0  0  0  0
+M  END
+$$$$
diff --git a/apps/protein_folding/helixfold3/data/demo_3fap_protein_sm.json b/apps/protein_folding/helixfold3/data/demo_3fap_protein_sm.json
new file mode 100644
index 00000000..caf27f02
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_3fap_protein_sm.json
@@ -0,0 +1,20 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "GVQVETISPGDGRTFPKRGQTCVVHYTGMLEDGKKFDSSRDRNKPFKFMLGKQEVIRGWEEGVAQMSVGQRAKLTISPDYAYGATGHPGIIPPHATLVFDVELLKLE",
+            "count": 1
+        },
+        {
+            "type": "protein",
+            "sequence": "VAILWHEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFNQAYGRDLMEAQEWCRKYMKSGNVKDLTQAWDLYYHVFRRIS",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "sdf": "/mnt/data/yinying/tests/helixfold/ligands/ARD_ideal.sdf",
+            "use_3d": false,
+            "count": 1
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_4Fe-4S.json b/apps/protein_folding/helixfold3/data/demo_4Fe-4S.json
new file mode 100644
index 00000000..4599c4af
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_4Fe-4S.json
@@ -0,0 +1,45 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "MKLNVDGLLVYFPYDYIYPEQFSYMRELKRTLDAKGHGVLEMPSGTGKTVSLLALIMAYQRAYPLEVTKLIYCSRTVPEIEKVIEELRKLLNFYEKQEGEKLPFLGLALSSRKNLCIHPEVTPLRFGKDVDGKCHSLTASYVRAQYQHDTSLPHCRFYEEFDAHGREVPLPAGIYNLDDLKALGRRQGWCPYFLARYSILHANVVVYSYHYLLDPKIADLVSKELARKAVVVFDEAHNIDNVCIDSMSVNLTRRTLDRCQGNLETLQKTVLRIKETDEQRLRDEYRRLVEGLREASAARETDAHLANPVLPDEVLQEAVPGSIRTAEHFLGFLRRLLEYVKWRLRVQHVVQESPPAFLSGLAQRVCIQRKPLRFCAERLRSLLHTLEITDLADFSPLTLLANFATLVSTYAKGFTIIIEPFDDRTPTIANPILHFSCMDASLAIKPVFERFQSVIITSGTLSPLDIYPKILDFHPVTMATFTMTLARVCLCPMIIGRGNDQVAISSKFETREDIAVIRNYGNLLLEMSAVVPDGIVAFFTSYQYMESTVASWYEQGILENIQRNKLLFIETQDGAETSVALEKYQEACENGRGAILLSVARGKVSEGIDFVHHYGRAVIMFGVPYVYTQSRILKARLEYLRDQFQIRENDFLTFDAMRHAAQCVGRAIRGKTDYGLMVFADKRFARGDKRGKLPRWIQEHLTDANLNLTVDEGVQVAKYFLRQMAQPFHREDQLGLSLLSLEQLESEETLKRIEQIAQQL",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "ccd": "SF4",
+            "count": 1,
+            "_note": "5T5I"
+        },
+        {
+            "type": "bond",
+            "bond": "A,CYS,116,SG,B,SF4,1,FE1,metalc,2.2;A,CYS,134,SG,B,SF4,1,FE2,metalc,2.2;A,CYS,155,SG,B,SF4,1,FE3,metalc,2.2;A,CYS,190,SG,B,SF4,1,FE4,metalc,2.2",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P18074/entry",
+            "_note":"ALL_CYS-ALL_FE"
+        },
+        {
+            "type": "bond",
+            "bond": "B,SF4,1,FE1,B,SF4,1,S2,metalc,2.2;B,SF4,1,FE1,B,SF4,1,S3,metalc,2.2;B,SF4,1,FE1,B,SF4,1,S4,metalc,2.2",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P18074/entry",
+            "_note":"FE1-S234"
+        },
+        {
+            "type": "bond",
+            "bond": "B,SF4,1,FE2,B,SF4,1,S1,metalc,2.2;B,SF4,1,FE2,B,SF4,1,S3,metalc,2.2;B,SF4,1,FE2,B,SF4,1,S4,metalc,2.2",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P18074/entry",
+            "_note":"FE2-S134"
+        },
+        {
+            "type": "bond",
+            "bond": "B,SF4,1,FE3,B,SF4,1,S1,metalc,2.2;B,SF4,1,FE3,B,SF4,1,S2,metalc,2.2;B,SF4,1,FE3,B,SF4,1,S4,metalc,2.2",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P18074/entry",
+            "_note":"FE3-S124"
+        },
+        {
+            "type": "bond",
+            "bond": "B,SF4,1,FE4,B,SF4,1,S1,metalc,2.2;B,SF4,1,FE14,B,SF4,1,S2,metalc,2.2;B,SF4,1,FE4,B,SF4,1,S3,metalc,2.2",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P18074/entry",
+            "_note":"FE4-S123"
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_7s69_coval.json b/apps/protein_folding/helixfold3/data/demo_7s69_coval.json
new file mode 100644
index 00000000..7b2550ff
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_7s69_coval.json
@@ -0,0 +1,20 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "DRHHHHHHKLGKMKIVEEPNSFGLNNPFLSQTNKLQPRVQPSPVSGPSHLFRLAGKCFNLVESTYKYELCPFHNVTQHEQTFRWNAYSGILGIWQEWDIENNTFSGMWMREGDSCGNKNRQTKVLLVCGKANKLSSVSEPSTCLYSLTFETPLVCHPHSLLVYPTLSEGLQEKWNEAEQALYDELITEQGHGKILKEIFREAGYLKTTKPDGEGKETQDKPKEFDSLEKCNKGYTELTSEIQRLKKMLNEHGISYVTNGTSRSEGQPAEVNTTFARGEDKVHLRGDTGIRDGQ",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "sdf": "/repo/PaddleHelix/apps/protein_folding/helixfold3/data/7s69_glycan.sdf",
+            "count": 1
+        },
+        {
+            "type": "bond",
+            "bond": "A,ASN,74,ND2,B,UNK-1,1,C16,covale,2.3",
+            "_comment": "'A,74,ND2:B,1:CW,null' from RF2AA.",
+            "_also_comment": "For ccd input, use CCD key as residue name; for smiles and file input, use `UNK-<index>` where index is the chain order you input"
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_7u7w_protein_nucleic.json b/apps/protein_folding/helixfold3/data/demo_7u7w_protein_nucleic.json
new file mode 100644
index 00000000..b846bea4
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_7u7w_protein_nucleic.json
@@ -0,0 +1,19 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "GPHMATGQDRVVALVDMDCFFVQVEQRQNPHLRNKPCAVVQYKSWKGGGIIAVSYEARAFGVTRSMWADDAKKLCPDLLLAQVRESRGKANLTKYREASVEVMEIMSRFAVIERASIDEAYVDLTSAVQERLQKLQGQPISADLLPSTYIEGLPQGPTTAEETVQKEGMRKQGLFQWLDSLQIDNLTSPDLQLTVGAVIVEEMRAAIERETGFQCSAGISHNKVLAKLACGLNKPNRQTLVSHGSVPQLFSQMPIRKIRSLGGKLGASVIEILGIEYMGELTQFTESQLQSHFGEKNGSWLYAMCRGIEHDPVKPRQLPKTIGCSKNFPGKTALATREQVQWWLLQLAQELEERLTKDRNDNDRVATQLVVSIRVQGDKRLSSLRRCCALTRYDAHKMSHDAFTVIKNCNTSGIQTEWSPPLTMLFLCATKFSAS",
+            "count": 1
+        },
+        {
+            "type": "dna",
+            "sequence": "CATTATGACGCT",
+            "count": 1
+        },
+        {
+            "type": "dna",
+            "sequence": "AGCGTCAT",
+            "count": 1
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_7u7w_protein_nucleic_sm.json b/apps/protein_folding/helixfold3/data/demo_7u7w_protein_nucleic_sm.json
new file mode 100644
index 00000000..77b4dbd1
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_7u7w_protein_nucleic_sm.json
@@ -0,0 +1,24 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "GPHMATGQDRVVALVDMDCFFVQVEQRQNPHLRNKPCAVVQYKSWKGGGIIAVSYEARAFGVTRSMWADDAKKLCPDLLLAQVRESRGKANLTKYREASVEVMEIMSRFAVIERASIDEAYVDLTSAVQERLQKLQGQPISADLLPSTYIEGLPQGPTTAEETVQKEGMRKQGLFQWLDSLQIDNLTSPDLQLTVGAVIVEEMRAAIERETGFQCSAGISHNKVLAKLACGLNKPNRQTLVSHGSVPQLFSQMPIRKIRSLGGKLGASVIEILGIEYMGELTQFTESQLQSHFGEKNGSWLYAMCRGIEHDPVKPRQLPKTIGCSKNFPGKTALATREQVQWWLLQLAQELEERLTKDRNDNDRVATQLVVSIRVQGDKRLSSLRRCCALTRYDAHKMSHDAFTVIKNCNTSGIQTEWSPPLTMLFLCATKFSAS",
+            "count": 1
+        },
+        {
+            "type": "dna",
+            "sequence": "CATTATGACGCT",
+            "count": 1
+        },
+        {
+            "type": "dna",
+            "sequence": "AGCGTCAT",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "ccd": "XG4",
+            "count": 1
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_disulf.json b/apps/protein_folding/helixfold3/data/demo_disulf.json
new file mode 100644
index 00000000..780b9d7d
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_disulf.json
@@ -0,0 +1,14 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "MASVKSSSSSSSSSFISLLLLILLVIVLQSQVIECQPQQSCTASLTGLNVCAPFLVPGSPTASTECCNAVQSINHDCMCNTMRIAAQIPAQCNLPPLSCSAN",
+            "count": 1
+        },
+        {
+            "type": "bond",
+            "bond": "A,CYS,41,SG,A,CYS,77,SG,disulf,2.2;A,CYS,51,SG,A,CYS,66,SG,disulf,2.2;A,CYS,67,SG,A,CYS,92,SG,disulf,2.2;A,CYS,79,SG,A,CYS,99,SG,disulf,2.2",
+            "_case_from": "https://www.uniprot.org/uniprotkb/Q43495/entry#ptm_processing"
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_disulf_homodimer.json b/apps/protein_folding/helixfold3/data/demo_disulf_homodimer.json
new file mode 100644
index 00000000..fd56fcd4
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_disulf_homodimer.json
@@ -0,0 +1,33 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "NSVHPCCDPVICEPREGEHCISGPCCENCYFLNSGTICKRARGDGNQDYCTGITPDCPRNRYNV",
+            "count": 2
+        },
+        {
+            "type": "bond",
+            "bond": "A,CYS,6,SG,A,CYS,29,SG,disulf,2.3;A,CYS,20,SG,A,CYS,26,SG,disulf,2.3;A,CYS,25,SG,A,CYS,50,SG,disulf,2.3;A,CYS,38,SG,A,CYS,57,SG,disulf,2.3",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P83658/entry#ptm_processing",
+            "_note": "Intrachain, A"
+        },
+        {
+            "type": "bond",
+            "bond": "B,CYS,6,SG,B,CYS,29,SG,disulf,2.3;B,CYS,20,SG,B,CYS,26,SG,disulf,2.3;B,CYS,25,SG,B,CYS,50,SG,disulf,2.3;B,CYS,38,SG,B,CYS,57,SG,disulf,2.3",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P83658/entry#ptm_processing",
+            "_note": "Intrachain, B"
+        },
+        {
+            "type": "bond",
+            "bond": "A,CYS,7,SG,B,CYS,12,SG,disulf,2.3",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P83658/entry#ptm_processing",
+            "_note": "Interchain, AB"
+        },
+        {
+            "type": "bond",
+            "bond": "B,CYS,7,SG,A,CYS,12,SG,disulf,2.3",
+            "_case_from": "https://www.uniprot.org/uniprotkb/P83658/entry#ptm_processing",
+            "_note": "Interchain, BA"
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_p450_heme.json b/apps/protein_folding/helixfold3/data/demo_p450_heme.json
new file mode 100644
index 00000000..a414ef58
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_p450_heme.json
@@ -0,0 +1,19 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "MDALYKSTVAKFNEVIQLDCSTEFFSIALSSIAGILLLLLLFRSKRHSSLKLPPGKLGIPFIGESFIFLRALRSNSLEQFFDERVKKFGLVFKTSLIGHPTVVLCGPAGNRLILSNEEKLVQMSWPAQFMKLMGENSVATRRGEDHIVMRSALAGFFGPGALQSYIGKMNTEIQSHINEKWKGKDEVNVLPLVRELVFNISAILFFNIYDKQEQDRLHKLLETILVGSFALPIDLPGFGFHRALQGRAKLNKIMLSLIKKRKEDLQSGSATATQDLLSVLLTFRDDKGTPLTNDEILDNFSSLLHASYDTTTSPMALIFKLLSSNPECYQKVVQEQLEILSNKEEGEEITWKDLKAMKYTWQVAQETLRMFPPVFGTFRKAITDIQYDGYTIPKGWKLLWTTYSTHPKDLYFNEPEKFMPSRFDQEGKHVAPYTFLPFGGGQRSCVGWEFSKMEILLFVHHFVKTFSSYTPVDPDEKISGDPLPPLPSKGFSIKLFPRP",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "ccd": "HEM",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "smiles": "CC1=C2CC[C@@]3(CCCC(=C)[C@H]3C[C@@H](C2(C)C)CC1)C",
+            "count": 1
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_p450_heme_coval.json b/apps/protein_folding/helixfold3/data/demo_p450_heme_coval.json
new file mode 100644
index 00000000..6761a4bc
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_p450_heme_coval.json
@@ -0,0 +1,25 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "MDALYKSTVAKFNEVIQLDCSTEFFSIALSSIAGILLLLLLFRSKRHSSLKLPPGKLGIPFIGESFIFLRALRSNSLEQFFDERVKKFGLVFKTSLIGHPTVVLCGPAGNRLILSNEEKLVQMSWPAQFMKLMGENSVATRRGEDHIVMRSALAGFFGPGALQSYIGKMNTEIQSHINEKWKGKDEVNVLPLVRELVFNISAILFFNIYDKQEQDRLHKLLETILVGSFALPIDLPGFGFHRALQGRAKLNKIMLSLIKKRKEDLQSGSATATQDLLSVLLTFRDDKGTPLTNDEILDNFSSLLHASYDTTTSPMALIFKLLSSNPECYQKVVQEQLEILSNKEEGEEITWKDLKAMKYTWQVAQETLRMFPPVFGTFRKAITDIQYDGYTIPKGWKLLWTTYSTHPKDLYFNEPEKFMPSRFDQEGKHVAPYTFLPFGGGQRSCVGWEFSKMEILLFVHHFVKTFSSYTPVDPDEKISGDPLPPLPSKGFSIKLFPRP",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "ccd": "HEM",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "smiles": "CC1=C2CC[C@@]3(CCCC(=C)[C@H]3C[C@@H](C2(C)C)CC1)C",
+            "count": 1
+        },
+        {
+            "type": "bond",
+            "bond": "A,CYS,445,SG,B,HEM,1,FE,covale,2.3",
+            "_comment": "<chain-id>,<residue name>,<residue index>,<atom id>,<chain-id>,<residue name>,<residue index>,<atom id>,<bond type>,<bond length>",
+            "_also_comment": "For ccd input, use CCD key as residue name; for smiles and file input, use `UNK-<index>` where index is the chain order you input"
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_p450_heme_sdf.json b/apps/protein_folding/helixfold3/data/demo_p450_heme_sdf.json
new file mode 100644
index 00000000..72500c6e
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_p450_heme_sdf.json
@@ -0,0 +1,19 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "MDALYKSTVAKFNEVIQLDCSTEFFSIALSSIAGILLLLLLFRSKRHSSLKLPPGKLGIPFIGESFIFLRALRSNSLEQFFDERVKKFGLVFKTSLIGHPTVVLCGPAGNRLILSNEEKLVQMSWPAQFMKLMGENSVATRRGEDHIVMRSALAGFFGPGALQSYIGKMNTEIQSHINEKWKGKDEVNVLPLVRELVFNISAILFFNIYDKQEQDRLHKLLETILVGSFALPIDLPGFGFHRALQGRAKLNKIMLSLIKKRKEDLQSGSATATQDLLSVLLTFRDDKGTPLTNDEILDNFSSLLHASYDTTTSPMALIFKLLSSNPECYQKVVQEQLEILSNKEEGEEITWKDLKAMKYTWQVAQETLRMFPPVFGTFRKAITDIQYDGYTIPKGWKLLWTTYSTHPKDLYFNEPEKFMPSRFDQEGKHVAPYTFLPFGGGQRSCVGWEFSKMEILLFVHHFVKTFSSYTPVDPDEKISGDPLPPLPSKGFSIKLFPRP",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "ccd": "HEM",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "sdf": "/mnt/data/yinying/tests/helixfold/ligands/60119277-3d.sdf",
+            "count": 1
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/data/demo_p450_heme_smiles.json b/apps/protein_folding/helixfold3/data/demo_p450_heme_smiles.json
new file mode 100644
index 00000000..cb6a22f5
--- /dev/null
+++ b/apps/protein_folding/helixfold3/data/demo_p450_heme_smiles.json
@@ -0,0 +1,19 @@
+{
+    "entities": [
+        {
+            "type": "protein",
+            "sequence": "MDALYKSTVAKFNEVIQLDCSTEFFSIALSSIAGILLLLLLFRSKRHSSLKLPPGKLGIPFIGESFIFLRALRSNSLEQFFDERVKKFGLVFKTSLIGHPTVVLCGPAGNRLILSNEEKLVQMSWPAQFMKLMGENSVATRRGEDHIVMRSALAGFFGPGALQSYIGKMNTEIQSHINEKWKGKDEVNVLPLVRELVFNISAILFFNIYDKQEQDRLHKLLETILVGSFALPIDLPGFGFHRALQGRAKLNKIMLSLIKKRKEDLQSGSATATQDLLSVLLTFRDDKGTPLTNDEILDNFSSLLHASYDTTTSPMALIFKLLSSNPECYQKVVQEQLEILSNKEEGEEITWKDLKAMKYTWQVAQETLRMFPPVFGTFRKAITDIQYDGYTIPKGWKLLWTTYSTHPKDLYFNEPEKFMPSRFDQEGKHVAPYTFLPFGGGQRSCVGWEFSKMEILLFVHHFVKTFSSYTPVDPDEKISGDPLPPLPSKGFSIKLFPRP",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "smiles": "CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C)C=C)C)C=C)C)CCC(=O)O)CCC(=O)O.[Fe+2]",
+            "count": 1
+        },
+        {
+            "type": "ligand",
+            "smiles": "CC1=C2CC[C@@]3(CCCC(=C)[C@H]3C[C@@H](C2(C)C)CC1)C",
+            "count": 1
+        }
+    ]
+}
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/helixfold/LICENSE b/apps/protein_folding/helixfold3/helixfold/LICENSE
new file mode 100644
index 00000000..fb84eeb3
--- /dev/null
+++ b/apps/protein_folding/helixfold3/helixfold/LICENSE
@@ -0,0 +1,36 @@
+# Terms of Use Agreement
+The HelixFold 3 open-source code and any derivative works are only available for non-commercial use by individuals and non-commercial organizations (universities, non-profit organizations and research institutes, educational and government bodies), or for journalism.
+
+
+# Usage Restrictions:
+The use of the HelixFold 3 open-source code is subject to the following conditions and restrictions:
+   1. Commercial Use Prohibited: The code and its outputs shall not be used by or for commercial entities, nor in any context related to commercial activities, including conducting research for commercial purposes or conferring any rights to commercial entities to use the outputs.
+
+   2. Integration with Automated Systems: The open-source code shall not be incorporated into any unauthorized automated systems, including, but not limited to, workflows for the screening or design of biomolecules.
+
+   3. Attribution Requirement: In any academic publication derived from the use of the open-source code, proper attribution to the original authors must be given in the manner specified, including the retention of the author's name, copyright notice, and a link to the license agreement, as well as a statement indicating whether modifications were made to the materials.
+
+   4. Share-Alike Obligation: If you modify or adapt the materials (e.g., through translation or rewriting), you must distribute your derivative works under the same "Attribution-NonCommercial-ShareAlike" license. This ensures that others may use your adapted work under identical terms.
+   
+   5. Prohibition on Illegal or Malicious Use: The code shall not be used to facilitate or engage in activities that are illegal, dangerous, or malicious.
+
+For commercial use, you must contact the authors to obtain a commercial license.\
+**Disclaimers：**
+
+The HelixFold 3 open-source code any derivative works are only for theoretical modelling. These are not intended, validated, or approved for clinical use.
+
+**Governing Law and Dispute Resolution:** \
+This agreement is governed by the laws of the People's Republic of China. In the event of a dispute arising from the execution of this agreement, the parties shall attempt to resolve it through amicable consultation. If consultation fails, either party may submit the dispute to the People's Court of Haidian District, Beijing for adjudication.
+
+**Termination:**
+   - **Automatic Termination:** The rights granted under this license will automatically terminate if you violate any of its terms. Rights may be reinstated upon rectification of the violation or with the licensor's express consent.
+
+   - **Surviving Provisions:** Certain provisions, including disclaimers and liability limitations, shall remain in effect even after the termination of this license.
+
+
+
+**Miscellaneous Provisions:**
+   - **No Additional Restrictions:** You may not impose any additional restrictions on the use of the materials, such as implementing technical protection measures (e.g., encryption), as doing so would contravene the spirit of this license.
+   - **Interpretation Authority:** Baidu reserves the right to make reasonable interpretations of this open-source service.
+
+Should any provision of this agreement be deemed invalid or unenforceable, the remaining provisions shall continue to be in full force and effect.
diff --git a/apps/protein_folding/helixfold3/helixfold/common/all_atom_pdb_save.py b/apps/protein_folding/helixfold3/helixfold/common/all_atom_pdb_save.py
index deb8e087..50b2500a 100644
--- a/apps/protein_folding/helixfold3/helixfold/common/all_atom_pdb_save.py
+++ b/apps/protein_folding/helixfold3/helixfold/common/all_atom_pdb_save.py
@@ -21,6 +21,7 @@
 import paddle
 import itertools
 import os
+import subprocess
 
 FeatureDict = Mapping[str, np.ndarray]
 ModelOutput = Mapping[str, Any]  # Is a nested dict.
@@ -164,14 +165,43 @@ def prediction_to_mmcif(pred_atom_pos: Union[np.ndarray, paddle.Tensor],
     - maxit_binary: path to maxit_binary, use to convert pdb to cif
     - mmcif_path: path to save *.cif
   """
-  assert maxit_binary is not None and os.path.exists(maxit_binary), (
+  if not os.path.isfile(maxit_binary):
+    raise FileNotFoundError(
       f'maxit_binary: {maxit_binary} not exists. '
       f'link: https://sw-tools.rcsb.org/apps/MAXIT/source.html')
-  assert mmcif_path.endswith('.cif'), f'mmcif_path should endswith .cif; got {mmcif_path}'
+  
+  if not mmcif_path.endswith('.cif'):
+     raise ValueError(f'mmcif_path should endswith .cif; got {mmcif_path}')
 
   pdb_path = mmcif_path.replace('.cif', '.pdb')
   pdb_path = prediction_to_pdb(pred_atom_pos, FeatsDict, pdb_path)
-  msg = os.system(f'{maxit_binary} -i {pdb_path} -o 1 -output {mmcif_path}')
-  if msg != 0:
-    print(f'convert pdb to cif failed, error message: {msg}')
+
+  cmd=[maxit_binary,
+       '-i', pdb_path,
+       '-o', '1',
+       '-output', mmcif_path,
+       ]
+  
+  print(f'Launching subprocess "{" " .join(cmd)}"', )
+
+  if os.path.exists('maxit.log'):
+     os.remove('maxit.log')
+    
+
+  process = subprocess.Popen(
+      cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=os.environ.copy())
+
+
+  stdout, stderr = process.communicate()
+  retcode = process.wait()
+
+
+  if retcode:
+    # Logs have a 15k character limit, so log Maxit error line by line.
+    print('Maxit failed. Maxit stderr begin:')
+    raise RuntimeError(f'Maxit failed\nstdout:\n{stdout.decode("utf-8")}\n\n'
+                       f'stderr:\n{stderr[:500_000].decode("utf-8")}\n'
+                       f'logfile:\n{open("maxit.log", "r").read().strip() if os.path.isfile("maxit.log") else ""}\n'
+                       f'Env:\n{os.environ.copy()}')
+
   return mmcif_path
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/helixfold/config/helixfold.yaml b/apps/protein_folding/helixfold3/helixfold/config/helixfold.yaml
new file mode 100644
index 00000000..6b40ea23
--- /dev/null
+++ b/apps/protein_folding/helixfold3/helixfold/config/helixfold.yaml
@@ -0,0 +1,77 @@
+defaults:
+  - _self_
+
+# General configuration
+
+bf16_infer: false  # Corresponds to --bf16_infer
+seed: null  # Corresponds to --seed
+logging_level: DEBUG  # Corresponds to --logging_level
+weight_path: /mnt/db/weights/helixfold/HelixFold3-params-240814/HelixFold3-240814.pdparams  # Corresponds to --init_model
+precision: fp32  # Corresponds to --precision
+amp_level: O1  # Corresponds to --amp_level
+infer_times: 1  # Corresponds to --infer_times
+diff_batch_size: -1  # Corresponds to --diff_batch_size
+use_small_bfd: false # Corresponds to --use_small_bfd
+msa_only: false # Only process msa
+
+nproc_msa: 
+  hhblits: 16 # Number of processors used by hhblits
+  jackhmmer: 8 # Number of processors used by jackhmmer
+
+# File paths
+
+input: null  # Corresponds to --input_json, required field
+output: null  # Corresponds to --output_dir, required field
+override: false # Set true to override existing msa output directory
+
+
+# Binary tool paths, leave them as null to find proper ones under PATH or conda bin path
+bin:
+  jackhmmer: null    # Corresponds to --jackhmmer_binary_path
+  hhblits: null  # Corresponds to --hhblits_binary_path
+  hhsearch: null   # Corresponds to --hhsearch_binary_path
+  kalign: null  # Corresponds to --kalign_binary_path
+  hmmsearch: null  # Corresponds to --hmmsearch_binary_path
+  hmmbuild: null  # Corresponds to --hmmbuild_binary_path
+  nhmmer: null  # Corresponds to --nhmmer_binary_path
+  obabel: null  # Inject to env as OBABEL_BIN
+
+# Database paths
+db:
+  uniprot: /mnt/db/uniprot/uniprot.fasta  # Corresponds to --uniprot_database_path, required field
+  pdb_seqres: /mnt/db/pdb_seqres/pdb_seqres.txt  # Corresponds to --pdb_seqres_database_path, required field
+  uniref90: /mnt/db/uniref90/uniref90.fasta  # Corresponds to --uniref90_database_path, required field
+  mgnify: /mnt/db/mgnify/mgy_clusters.fa  # Corresponds to --mgnify_database_path, required field
+  bfd: /mnt/db/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt  # Corresponds to --bfd_database_path
+  small_bfd: /mnt/db/reduced_bfd/bfd-first_non_consensus_sequences.fasta  # Corresponds to --small_bfd_database_path
+  uniclust30: /mnt/db/uniref30_uc30/UniRef30_2022_02/UniRef30_2022_02  # Corresponds to --uniclust30_database_path
+  rfam: /mnt/db/helixfold/rna/Rfam-14.9_rep_seq.fasta  # Corresponds to --rfam_database_path, required field
+  ccd_preprocessed: /mnt/db/ccd/ccd_preprocessed_etkdg.pkl.gz  # Corresponds to --ccd_preprocessed_path, required field
+
+# Template and PDB information
+template:
+  mmcif_dir: /mnt/db/pdb_mmcif/mmcif_files  # Corresponds to --template_mmcif_dir, required field
+  max_date: '2023-03-15'  # Corresponds to --max_template_date, required field
+  obsolete_pdbs: /mnt/db/pdb_mmcif/obsolete.dat  # Corresponds to --obsolete_pdbs_path, required field
+
+# Preset configuration
+preset:
+  preset: full_dbs  # Corresponds to --preset, choices=['reduced_dbs', 'full_dbs']
+
+# Other configurations
+other:
+  maxit_binary: /mnt/data/software/maxit/maxit-v11.100-prod-src/bin/maxit  # Corresponds to --maxit_binary
+
+
+# CONFIG_DIFFS for advanced configuration
+CONFIG_DIFFS:
+  preset: null #choices=['null','allatom_demo', 'allatom_subbatch_64_recycle_1']
+
+  # Detailed configuration adjustments against `CONFIG_ALLATOM` can be used here. for example: 
+  # model:
+    # global_config:
+    #   subbatch_size: 96 # model.global_config.subbatch_size
+    # num_recycle: 3 # model.num_recycle
+    # heads:
+    #   confidence_head:
+    #     weight: 0.0 # model.heads.confidence_head.weight
diff --git a/apps/protein_folding/helixfold3/helixfold/data/mmcif_parsing_paddle.py b/apps/protein_folding/helixfold3/helixfold/data/mmcif_parsing_paddle.py
index d5e8f875..d6f907eb 100644
--- a/apps/protein_folding/helixfold3/helixfold/data/mmcif_parsing_paddle.py
+++ b/apps/protein_folding/helixfold3/helixfold/data/mmcif_parsing_paddle.py
@@ -22,7 +22,7 @@
 from Bio import PDB
 import numpy as np
 import time
-import logging
+from absl import logging
 from helixfold.common.residue_constants import crystallization_aids, ligand_exclusion_list
 
 # Type aliases:
@@ -34,7 +34,6 @@
 SeqRes = str
 MmCIFDict = Mapping[str, Sequence[str]]
 
-logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
 
 @dataclasses.dataclass(frozen=True)
 class Monomer:
diff --git a/apps/protein_folding/helixfold3/helixfold/data/pipeline_conf_bonds.py b/apps/protein_folding/helixfold3/helixfold/data/pipeline_conf_bonds.py
index dee10bb0..6b46bb18 100644
--- a/apps/protein_folding/helixfold3/helixfold/data/pipeline_conf_bonds.py
+++ b/apps/protein_folding/helixfold3/helixfold/data/pipeline_conf_bonds.py
@@ -1,9 +1,22 @@
 """Functions for building the input features (reference ccd features) for the HelixFold model."""
 
 import collections
-from typing import Optional
-from helixfold.common import residue_constants
+from dataclasses import dataclass
+import gzip
+import os
+import pickle
+import re
+from typing import Any, List, Literal, Optional, Tuple
+
+from absl import logging
+from immutabledict import immutabledict
 import numpy as np
+from openbabel import openbabel
+
+from helixfold.common import residue_constants
+from helixfold.data.tools import utils
+
+
 
 ALLOWED_LIGAND_BONDS_TYPE = {
     "SING": 1,
@@ -13,17 +26,131 @@
     "AROM": 12,
 }
 
+# Define the possible bond types as a Literal
+# https://mmcif.wwpdb.org/dictionaries/mmcif_std.dic/Items/_struct_conn_type.id.html
+BondType = Literal["covale", "covale_base", "covale_phosphate", "covale_sugar", "disulf", "hydrog",'metalc','mismat','modres','saltbr']
+
+
+@dataclass
+class AtomPartner:
+    """
+    Represents one partner atom in a covalent bond.
+
+    Attributes:
+        label_asym_id (str): The asymmetry identifier for the partner atom (i.e., chain ID).
+        label_comp_id (str): The component identifier for the partner atom (i.e., residue name).
+        seq_id (str): The sequence identifier for the partner atom (merged label_seq_id and auth_seq_id).
+        label_atom_id (str): The atom identifier for the partner atom (i.e., atom name).
+    """
+
+    label_asym_id: str  # Chain ID
+    label_comp_id: str  # Residue name
+    seq_id: str         # Merged sequence ID
+    label_atom_id: str  # Atom name
+
+
+@dataclass
+class CovalentBond:
+    """
+    Represents a covalent bond between two atoms in a molecular structure.
+
+    Attributes:
+        atom_1 (AtomPartner): The first partner atom in the bond.
+        atom_2 (AtomPartner): The second partner atom in the bond.
+        bond_type (BondType): The type of the bond.
+        pdbx_dist_value (float): The distance value as defined in the PDBx/mmCIF format.
+    """
+
+    atom_1: AtomPartner
+    atom_2: AtomPartner
+    bond_type: BondType
+    pdbx_dist_value: float
+
+def parse_covalent_bond_input(input_string: str) -> List[CovalentBond]:
+    """
+    Parses a human-readable string into a list of CovalentBond objects.
+
+    Args:
+        input_string (str): A string representing covalent bonds, where each bond is
+                            described by two atom partners separated by a comma,
+                            and multiple bonds are separated by semicolons.
+                            Example: "A,GLY,25,CA,A,GLY,25,N,covale,1.32; B,HIS,58,ND1,B,HIS,58,CE1,covale,1.39"
+
+    Returns:
+        List[CovalentBond]: A list of CovalentBond objects.
+    """
+    covalent_bonds = []
+
+    # Split the input string by semicolons to separate individual covalent bonds
+    bond_strings = input_string.split(';')
+
+    for bond_str in bond_strings:
+        # Split the individual bond string by commas to separate attributes
+        bond_parts = bond_str.split(',')
+
+        if len(bond_parts) != 10:
+            raise ValueError(f"Invalid bond format: {bond_str}. Expected 10 fields per bond.")
+
+        # Create AtomPartner instances for the two atoms in the bond
+        atom_1 = AtomPartner(
+            label_asym_id=bond_parts[0].strip(),
+            label_comp_id=bond_parts[1].strip(),
+            seq_id=bond_parts[2].strip(),
+            label_atom_id=bond_parts[3].strip()
+        )
+
+        atom_2 = AtomPartner(
+            label_asym_id=bond_parts[4].strip(),
+            label_comp_id=bond_parts[5].strip(),
+            seq_id=bond_parts[6].strip(),
+            label_atom_id=bond_parts[7].strip()
+        )
+
+        # Create a CovalentBond instance
+        covalent_bond = CovalentBond(
+            atom_1=atom_1,
+            atom_2=atom_2,
+            bond_type=bond_parts[8].strip(),
+            pdbx_dist_value=float(bond_parts[9].strip())
+        )
+
+        # Append the CovalentBond instance to the list
+        covalent_bonds.append(covalent_bond)
+    logging.info(f"Added {len(covalent_bonds)} bonds: {covalent_bonds}")
+
+    return covalent_bonds
+
+def load_ccd_dict(ccd_preprocessed_path: str) -> immutabledict[str, Any]:
+    if not os.path.exists(ccd_preprocessed_path):
+      raise FileNotFoundError(f'[CCD] ccd_preprocessed_path: {ccd_preprocessed_path} not exist.')
+    
+    if not ccd_preprocessed_path.endswith('.pkl.gz') and not ccd_preprocessed_path.endswith('.pkl'):
+        raise ValueError(f'[CCD] ccd_preprocessed_path: {ccd_preprocessed_path} not endswith .pkl.gz and .pkl')
+  
+    with utils.timing(f'Loading CCD dataset from {ccd_preprocessed_path}'):
+      if ccd_preprocessed_path.endswith('.pkl.gz'):
+          with gzip.open(ccd_preprocessed_path, "rb") as fp:
+              ccd_preprocessed_dict = immutabledict(pickle.load(fp))
+      else:
+          with open(ccd_preprocessed_path, "rb") as fp:
+              ccd_preprocessed_dict = immutabledict(pickle.load(fp))
+    
+    logging.info(f'CCD dataset contains {len(ccd_preprocessed_dict)} entries.')
+    
+    return ccd_preprocessed_dict
+
 def element_map_with_x(atom_symbol):
   # ## one-hot max shape == 128
   return residue_constants.ATOM_ELEMENT.get(atom_symbol, 127)
 
-def convert_atom_id_name(atom_id: str) -> int:
+def convert_atom_id_name(atom_id: str) -> list[int]:
   """
     Converts unique atom_id names to integer of atom_name. need to be padded to length 4.
     Each character is encoded as ord(c) − 32
   """
+  if (len_atom_id:=len(atom_id))>4:
+     raise ValueError(f'atom_id: `{atom_id}` is too long, max length is 4.')
   atom_id_pad = atom_id.ljust(4, ' ')
-  assert len(atom_id_pad) == 4
   return [ord(c) - 32 for c in atom_id_pad]
 
 
@@ -79,8 +206,16 @@ def make_ccd_conf_features(all_chain_info, ccd_preprocessed_dict,
       features['ref_element'].append(np.array([element_map_with_x(t[0].upper() + t[1:].lower())
                                               for t in _ccd_feats['atom_symbol']], dtype=np.int32))
       features['ref_charge'].append(np.array(_ccd_feats['charge'], dtype=np.int32))
+
+      # atom checks
+      for atom_id in _ccd_feats['atom_ids']:
+        converted_atom_id_name=convert_atom_id_name(atom_id.upper())
+        if max(converted_atom_id_name)>= 64:
+          raise ValueError(f'>>> Problematic atom in ligand ({residue_id=}, {ccd_id=}, {chain_id=}) {atom_id=}, {converted_atom_id_name=}')
+        # logging.debug(f'({residue_id=}, {ccd_id=}, {chain_id=}) {atom_id=}, {converted_atom_id_name=}')
+      
       features['ref_atom_name_chars'].append(
-                              np.array([convert_atom_id_name(atom_id) for atom_id in _ccd_feats['atom_ids']]
+                              np.array([convert_atom_id_name(atom_id.upper()) for atom_id in _ccd_feats['atom_ids']]
                                                                 , dtype=np.int32))
       
       # here we get ref_space_uid [ Each (chain id, residue index) tuple is assigned an integer on first appearance.]
@@ -107,13 +242,14 @@ def make_ccd_conf_features(all_chain_info, ccd_preprocessed_dict,
     features[k] = np.concatenate(v, axis=0)
   features['ref_atom_count'] = np.bincount(features['ref_token2atom_idx'])
 
-  assert np.max(features['ref_element']) < 128
-  assert np.max(features['ref_atom_name_chars']) < 64
-  assert len(set([len(v) for k, v in features.items() if k != 'ref_atom_count'])) == 1 ## To check same Atom-level features.
+  if (len_ref_element:=np.max(features['ref_element'])) >= 128:
+     raise ValueError(f'{len_ref_element=}, which is larger then 128.\n{features["ref_element"]}\n{"-"*79}')
+
+  assert len(set([len(v) for k, v in features.items() if k != 'ref_atom_count'])) == 1 ## To check same Atom-level features. # WTF?
   return features
 
 
-def make_bond_features(covalent_bond, all_chain_info, ccd_preprocessed_dict, 
+def make_bond_features(covalent_bond: List[CovalentBond], all_chain_info, ccd_preprocessed_dict, 
                                       extra_feats: Optional[dict]=None):
   """
       all_chain_info: dict, (chain_type_chain_id): ccd_seq (list of ccd), such as: protein_A: ['ALA', 'MET', 'GLY']
@@ -134,24 +270,29 @@ def make_bond_features(covalent_bond, all_chain_info, ccd_preprocessed_dict,
   _set_chain_id_list = set(chain_id_list)
   parsed_covalent_bond = []
   for _bond in covalent_bond:
-    left_bond_atomid, right_bond_atomid = _bond['ptnr1_label_atom_id'], _bond['ptnr2_label_atom_id']
-    left_bond_name, right_bond_name = _bond['ptnr1_label_comp_id'], _bond['ptnr2_label_comp_id']
-    left_bond, right_bond = _bond['ptnr1_label_asym_id'], _bond['ptnr2_label_asym_id']
+    # Accessing the AtomPartner attributes for both atoms in the covalent bond
+    left_bond_atomid, right_bond_atomid = _bond.atom_1.label_atom_id, _bond.atom_2.label_atom_id
+    left_bond_name, right_bond_name = _bond.atom_1.label_comp_id, _bond.atom_2.label_comp_id
+    left_bond, right_bond = _bond.atom_1.label_asym_id, _bond.atom_2.label_asym_id
     
-    left_bond_idx, right_bond_idx = _bond['ptnr1_label_seq_id'], _bond['ptnr2_label_seq_id']
-    auth_left_idx, auth_right_idx = _bond['ptnr1_auth_seq_id'], _bond['ptnr2_auth_seq_id']
+    left_bond_idx, right_bond_idx = _bond.atom_1.seq_id, _bond.atom_2.seq_id
+    auth_left_idx, auth_right_idx = _bond.atom_1.seq_id, _bond.atom_2.seq_id
+
     left_bond_idx = 1 if left_bond_idx == '.' else left_bond_idx
     right_bond_idx = 1 if right_bond_idx == '.' else right_bond_idx
     
-    if _bond['bond_type'] != "covale":
-      continue
+    if _bond.bond_type != "covale":
+      logging.warning(f'Detected non-covale bond type: {_bond.bond_type}')
+      # continue
     
-    if _bond['pdbx_dist_value'] > 2.4:
+    if _bond.pdbx_dist_value > 2.4:
       # the covalent_bond is cut off by distance=2.4
+      logging.warning(f'Ignore bonding with distance > 2.4: {_bond.pdbx_dist_value}')
       continue
     
     ## When some chainID is filtered, bond need to be filtered too.
     if (left_bond not in _set_chain_id_list) or (right_bond not in _set_chain_id_list):
+      logging.warning(f'Ignore bonding with left and right out of chain list: ')
       continue
 
     parsed_covalent_bond.append([left_bond, left_bond_name, left_bond_idx, left_bond_atomid, auth_left_idx,
@@ -167,6 +308,8 @@ def make_bond_features(covalent_bond, all_chain_info, ccd_preprocessed_dict,
   chainId_to_type = {}
   ligand_bond_type = [] # (i, j, bond_type), represent the bond between token i and token j
   bond_index = [] # (i,j) represent the bond between token i and token j
+
+  ccd_id2atom_ids: dict[str, list]={}
   ccd_standard_set = residue_constants.STANDARD_LIST
   for chain_type_id, ccd_seq in all_chain_info.items():
       chain_type, chain_id = chain_type_id.rsplit('_', 1)
@@ -187,6 +330,8 @@ def make_bond_features(covalent_bond, all_chain_info, ccd_preprocessed_dict,
             else:
                 _ccd_feats = ccd_preprocessed_dict[ccd_id]
             atom_ids = _ccd_feats['atom_ids']
+
+            ccd_id2atom_ids[ccd_id] = atom_ids
             assert len(atom_ids) > 0, f'TODO filter - Got CCD <{ccd_id}>: 0 atom nums.'
             
             all_token_nums += len(atom_ids)
@@ -233,16 +378,43 @@ def make_bond_features(covalent_bond, all_chain_info, ccd_preprocessed_dict,
           ptnr2_label_seq_id = ptnr2_auth_seq_id
 
       try:
-        assert ptnr1_label_asym_id in chainId_to_ccd_list and ptnr2_label_asym_id in chainId_to_ccd_list
+        if not (ptnr1_label_asym_id in chainId_to_ccd_list and ptnr2_label_asym_id in chainId_to_ccd_list):
+           raise ValueError(f"Invalid chain id:\n{ptnr1_label_asym_id}/{ptnr2_label_asym_id}\n{chainId_to_ccd_list}")
         ptnr1_ccd_id = chainId_to_ccd_list[ptnr1_label_asym_id][int(ptnr1_label_seq_id) - 1]
         ptnr2_ccd_id = chainId_to_ccd_list[ptnr2_label_asym_id][int(ptnr2_label_seq_id) - 1]
-        assert ptnr1_ccd_id == ptnr1_label_comp_id and ptnr2_ccd_id == ptnr2_label_comp_id
-      except:
+
+
+        # renamed ligand residues
+
+
+        if ptnr1_ccd_id != ptnr1_label_comp_id:
+           logging.warning(f"Find ligand residue: {ptnr1_label_comp_id} -> {ptnr1_ccd_id}")
+           #ptnr1_label_comp_id = ptnr1_ccd_id
+
+        if ptnr2_ccd_id != ptnr2_label_comp_id:
+           logging.warning(f"Find ligand residue: {ptnr2_label_comp_id} -> {ptnr2_ccd_id}")
+           #ptnr2_label_comp_id = ptnr2_ccd_id
+
+      except ValueError as e:
         ## some convalent-bond from mmcif is misslead, pass it.
+        logging.warning(f'Error occurred during covalent bond processing: {e}')
         continue
+
+
+      
+      if ptnr1_ccd_id in ccd_preprocessed_dict:
+        ptnr1_ccd_atoms_list = ccd_preprocessed_dict[ptnr1_ccd_id]['atom_ids']
+      else:
+        ptnr1_ccd_atoms_list = ccd_id2atom_ids[ptnr1_ccd_id]
+        
+      logging.debug(f'{ptnr1_ccd_id=}: {ptnr1_ccd_atoms_list=}')
       
-      ptnr1_ccd_atoms_list = ccd_preprocessed_dict[ptnr1_ccd_id]['atom_ids']
-      ptnr2_ccd_atoms_list = ccd_preprocessed_dict[ptnr2_ccd_id]['atom_ids']
+      if ptnr2_ccd_id in ccd_preprocessed_dict:
+        ptnr2_ccd_atoms_list = ccd_preprocessed_dict[ptnr2_ccd_id]['atom_ids']
+      else:
+        ptnr2_ccd_atoms_list  = ccd_id2atom_ids[ptnr2_ccd_id]
+
+      logging.debug(f'{ptnr2_ccd_id=}: {ptnr2_ccd_atoms_list=}')
 
       if ptnr1_ccd_id in ccd_standard_set:  
           ## if ccd_id is in the standard residue in HF3 (table 13), we didn't have to map to atom-leval index
diff --git a/apps/protein_folding/helixfold3/helixfold/data/pipeline_multimer_parallel.py b/apps/protein_folding/helixfold3/helixfold/data/pipeline_multimer_parallel.py
index 9d595a07..251503aa 100644
--- a/apps/protein_folding/helixfold3/helixfold/data/pipeline_multimer_parallel.py
+++ b/apps/protein_folding/helixfold3/helixfold/data/pipeline_multimer_parallel.py
@@ -14,7 +14,8 @@
 from helixfold.data import feature_processing
 from helixfold.data import msa_pairing
 from helixfold.data import parsers
-from helixfold.data import pipeline
+#from helixfold.data import pipeline
+from helixfold.data import pipeline_parallel as pipeline
 from helixfold.data.tools import jackhmmer
 import numpy as np
 import multiprocessing
@@ -197,24 +198,38 @@ def _process_single_chain(
     with temp_fasta_file(chain_fasta_str) as chain_fasta_path:
       logging.info('Running monomer pipeline on chain %s: %s',
                    chain_id, description)
+
+      # We only construct the pairing features if there are 2 or more unique
+      # sequences.
+      self.jackhmmer_uniprot_args=(
+          self._uniprot_msa_runner,
+          str(chain_fasta_path),
+          os.path.join(chain_msa_output_dir, 'uniprot_hits.sto'),
+          'sto',
+          self.use_precomputed_msas,
+          0
+      )
+      
       chain_features = self._monomer_data_pipeline.process(
           input_fasta_path=chain_fasta_path,
-          msa_output_dir=chain_msa_output_dir)
+          msa_output_dir=chain_msa_output_dir,
+          other_args=self.jackhmmer_uniprot_args if not is_homomer_or_monomer else None)
 
       # We only construct the pairing features if there are 2 or more unique
       # sequences.
       if not is_homomer_or_monomer:
-        all_seq_msa_features = self._all_seq_msa_features(chain_fasta_path,
-                                                          chain_msa_output_dir)
+        all_seq_msa_features = self._all_seq_msa_features(chain_msa_output_dir)
         chain_features.update(all_seq_msa_features)
     return chain_features
 
-  def _all_seq_msa_features(self, input_fasta_path, msa_output_dir):
+  def _all_seq_msa_features(self, msa_output_dir):
     """Get MSA features for unclustered uniprot, for pairing."""
-    out_path = os.path.join(msa_output_dir, 'uniprot_hits.sto')
-    result = pipeline.run_msa_tool(
-        self._uniprot_msa_runner, input_fasta_path, out_path, 'sto',
-        self.use_precomputed_msas)
+    # edited by yinying to adapt to the multiprocess version of run_msa_tool function
+    result = pipeline.read_msa_result(
+        msa_out_path=os.path.join(msa_output_dir, 'uniprot_hits.sto'),
+        msa_format='sto',
+        max_sto_sequences=0
+    )
     msa = parsers.parse_stockholm(result['sto'])
     msa = msa.truncate(max_seqs=self._max_uniprot_hits)
     all_seq_features = pipeline.make_msa_features([msa])
diff --git a/apps/protein_folding/helixfold3/helixfold/data/pipeline_parallel.py b/apps/protein_folding/helixfold3/helixfold/data/pipeline_parallel.py
index 3ca52693..ade48cb5 100644
--- a/apps/protein_folding/helixfold3/helixfold/data/pipeline_parallel.py
+++ b/apps/protein_folding/helixfold3/helixfold/data/pipeline_parallel.py
@@ -15,7 +15,7 @@
 """Functions for building the input features for the HelixFold model."""
 
 import os
-from typing import Any, Mapping, MutableMapping, Optional, Sequence, Union
+from typing import Any, Mapping, MutableMapping, Optional, Protocol, Sequence, Tuple, Union
 from absl import logging
 from helixfold.common import residue_constants
 from helixfold.data import msa_identifiers
@@ -26,12 +26,23 @@
 from helixfold.data.tools import hmmsearch
 from helixfold.data.tools import jackhmmer
 import numpy as np
-from concurrent.futures import ProcessPoolExecutor, as_completed
+from joblib import Parallel, delayed
 # Internal import (7716).
 
 FeatureDict = MutableMapping[str, np.ndarray]
 TemplateSearcher = Union[hhsearch.HHSearch, hmmsearch.Hmmsearch]
 
+class MsaRunner(Protocol):
+    n_cpu: int
+    
+    def query(self, input_fasta_path: str) -> Sequence[Mapping[str, Any]]:
+        """Runs the MSA tool on the input fasta file."""
+        ...
+
+def check_used_ncpus(used: list[int]):
+   ncpus_sum=sum(used)
+   if ncpus_sum >os.cpu_count():
+      logging.warning(f"The number of used CPUs({ncpus_sum}) is larger than the number of available CPUs({os.cpu_count()}).")
 
 def make_sequence_features(
     sequence: str, description: str, num_res: int) -> FeatureDict:
@@ -83,42 +94,41 @@ def make_msa_features(msas: Sequence[parsers.Msa]) -> FeatureDict:
   features['msa_species_identifiers'] = np.array(species_ids, dtype=np.object_)
   return features
 
-
-def run_msa_tool(msa_runner, input_fasta_path: str, msa_out_path: str,
-                 msa_format: str, use_precomputed_msas: bool,
-                 max_sto_sequences: Optional[int] = None
-                 ) -> Mapping[str, Any]:
-  """Runs an MSA tool, checking if output already exists first."""
-  if not use_precomputed_msas or not os.path.exists(msa_out_path):
-    if msa_format == 'sto' and max_sto_sequences is not None:
-      print('pipeline:',input_fasta_path,max_sto_sequences)
-      result = msa_runner.query(input_fasta_path, max_sto_sequences)[0]  # pytype: disable=wrong-arg-count
+# Yinying edited here to change various position args as one tuple for multiprocess tests
+def run_msa_tool(args: Tuple[MsaRunner, str, str, str, bool, int]) -> Mapping[str, Any]:
+    if args == None:
+        return None
+    if (len_args:=len(args))!=6:
+       raise ValueError(f'MsaRunner must have exactly 6 arguments but got {len_args}')
+
+    (msa_runner, input_fasta_path, msa_out_path,
+      msa_format, use_precomputed_msas,
+      max_sto_sequences) = args
+    #print(args)
+    """Runs an MSA tool, checking if output already exists first."""
+    if not use_precomputed_msas or not os.path.exists(msa_out_path):
+        if msa_format == 'sto' and max_sto_sequences > 0:
+            result = msa_runner.query(input_fasta_path, max_sto_sequences)[0]  # pytype: disable=wrong-arg-count
+        else:
+            result = msa_runner.query(input_fasta_path)[0]
+        with open(msa_out_path, 'w') as f:
+            f.write(result[msa_format])
     else:
-      result = msa_runner.query(input_fasta_path)[0]
-    with open(msa_out_path, 'w') as f:
-      f.write(result[msa_format])
-  else:
+        result=read_msa_result(msa_out_path,msa_format,max_sto_sequences)
+    return result
+
+def read_msa_result(msa_out_path,msa_format,max_sto_sequences):
     logging.warning('Reading MSA from file %s', msa_out_path)
-    if msa_format == 'sto' and max_sto_sequences is not None:
-      precomputed_msa = parsers.truncate_stockholm_msa(
-          msa_out_path, max_sto_sequences)
-      result = {'sto': precomputed_msa}
+    if msa_format == 'sto' and max_sto_sequences > 0:
+        precomputed_msa = parsers.truncate_stockholm_msa(
+            msa_out_path, max_sto_sequences)
+        result = {'sto': precomputed_msa}
     else:
-      with open(msa_out_path, 'r') as f:
-        result = {msa_format: f.read()}
-  return result
+        with open(msa_out_path, 'r') as f:
+            result = {msa_format: f.read()}
+    return result
+
 
-def run_msa_tool_wrapper(args):
-    """
-    用于包装run_msa_tool函数的帮助程序，以便在使用argparse时可以更轻松地传递参数。
-    
-    Args:
-        args (tuple, list): 一个元组或列表，其中包含要传递给run_msa_tool函数的参数。
-    
-    Returns:
-        int: 返回run_msa_tool函数的返回值。
-    """
-    return run_msa_tool(*args)
 
 
 class DataPipeline:
@@ -138,30 +148,39 @@ def __init__(self,
                use_small_bfd: bool,
                mgnify_max_hits: int = 501,
                uniref_max_hits: int = 10000,
-               use_precomputed_msas: bool = False):
+               use_precomputed_msas: bool = False,
+               nprocs: Mapping[str, int] = {
+                  'hhblits': 16,
+                  'jackhmmer': 8,
+               }):
     """Initializes the data pipeline. Constructs a feature dict for a given FASTA file."""
     self._use_small_bfd = use_small_bfd
+    self.nprocs=nprocs
     self.jackhmmer_uniref90_runner = jackhmmer.Jackhmmer(
         binary_path=jackhmmer_binary_path,
-        database_path=uniref90_database_path)
+        database_path=uniref90_database_path, n_cpu=self.nprocs.get('jackhmmer', 8))
     if use_small_bfd:
-      self.jackhmmer_small_bfd_runner = jackhmmer.Jackhmmer(
+      self.bfd_runner = jackhmmer.Jackhmmer(
           binary_path=jackhmmer_binary_path,
-          database_path=small_bfd_database_path)
+          database_path=small_bfd_database_path, n_cpu=self.nprocs.get('jackhmmer', 8))
     else:
-      self.hhblits_bfd_uniclust_runner = hhblits.HHBlits(
+      self.bfd_runner = hhblits.HHBlits(
           binary_path=hhblits_binary_path,
-          databases=[bfd_database_path, uniclust30_database_path])
+          databases=[bfd_database_path, uniclust30_database_path], n_cpu=self.nprocs.get('hhblits', 8))
     self.jackhmmer_mgnify_runner = jackhmmer.Jackhmmer(
         binary_path=jackhmmer_binary_path,
-        database_path=mgnify_database_path)
+        database_path=mgnify_database_path, n_cpu=self.nprocs.get('jackhmmer', 8))
     self.template_searcher = template_searcher
     self.template_featurizer = template_featurizer
     self.mgnify_max_hits = mgnify_max_hits
     self.uniref_max_hits = uniref_max_hits
     self.use_precomputed_msas = use_precomputed_msas
 
-  def process(self, input_fasta_path: str, msa_output_dir: str) -> FeatureDict:
+  def parallel_msa_joblib(self, func, input_args: list):
+    return Parallel(len(input_args),verbose=100)(delayed(func)(args) for args in input_args)
+
+
+  def process(self, input_fasta_path: str, msa_output_dir: str,other_args: Optional[tuple] = None) -> FeatureDict:
     """Runs alignment tools on the input sequence and creates features."""
     with open(input_fasta_path) as f:
       input_fasta_str = f.read()
@@ -174,7 +193,7 @@ def process(self, input_fasta_path: str, msa_output_dir: str) -> FeatureDict:
     num_res = len(input_sequence)
 
 
-    msa_tasks = []
+    msa_tasks: list[Tuple[MsaRunner, str, str, str, bool, int]] = []
     """uniref90_out_path = os.path.join(msa_output_dir, 'uniref90_hits.sto')
     jackhmmer_uniref90_result = run_msa_tool(
         msa_runner=self.jackhmmer_uniref90_runner,
@@ -202,44 +221,32 @@ def process(self, input_fasta_path: str, msa_output_dir: str) -> FeatureDict:
            input_fasta_path,                                                                                                                    
            os.path.join(msa_output_dir, 'mgnify_hits.sto'),                                                                                  
            'sto',                                                                                                                               
-           self.use_precomputed_msas))
+           self.use_precomputed_msas,
+           self.mgnify_max_hits))
 
-    if self._use_small_bfd:
-      msa_tasks.append((
-          self.jackhmmer_small_bfd_runner,
-          input_fasta_path,
-          os.path.join(msa_output_dir, 'small_bfd_hits.sto'),
-          'sto',
-          self.use_precomputed_msas))
-    else:
-      msa_tasks.append((
-          self.hhblits_bfd_uniclust_runner,
-          input_fasta_path,
-          os.path.join(msa_output_dir, 'bfd_uniclust_hits.a3m'),
-          'a3m',
-          self.use_precomputed_msas))
-
-    msa_results = {}  
-    with ProcessPoolExecutor() as executor:
-      futures = {executor.submit(run_msa_tool_wrapper, msa_task): msa_task for msa_task in msa_tasks}
-  
-      for future in as_completed(futures):
-        task = futures[future]
-        try:
-          result = future.result()
-          if 'uniref90_hits.sto' in task[2]:
-              msa_results['uniref90'] = result
-          elif 'mgnify_hits.sto' in task[2]:
-              msa_results['mgnify'] = result
-          elif 'small_bfd_hits.sto' in task[2]:
-              msa_results['small_bfd'] = result
-          elif 'bfd_uniclust_hits.a3m' in task[2]:
-              msa_results['bfd_uniclust'] = result
-
-        except Exception as exc:
-          print(f'Task {task} generated an exception : {exc}')
-
-    msa_for_templates = msa_results['uniref90']['sto']
+
+    msa_tasks.append((
+        self.bfd_runner,
+        input_fasta_path,
+        os.path.join(msa_output_dir, 'small_bfd_hits.sto' if self._use_small_bfd else 'bfd_uniclust_hits.a3m'),
+        'sto' if self._use_small_bfd else 'a3m',
+        self.use_precomputed_msas,
+        0))
+    
+    msa_tasks.append(other_args)
+
+    check_used_ncpus(used=[mask[0].n_cpu for mask in msa_tasks if hasattr(mask[0], 'n_cpu')])
+
+    [
+        jackhmmer_uniref90_result,
+        jackhmmer_mgnify_result,
+        bfd_result,
+        other_result
+        
+    ] = self.parallel_msa_joblib(func=run_msa_tool,
+                          input_args=msa_tasks)
+
+    msa_for_templates = jackhmmer_uniref90_result['sto']
     msa_for_templates = parsers.deduplicate_stockholm_msa(msa_for_templates)
     msa_for_templates = parsers.remove_empty_columns_from_stockholm_msa(msa_for_templates)
 
@@ -257,16 +264,16 @@ def process(self, input_fasta_path: str, msa_output_dir: str) -> FeatureDict:
     with open(pdb_hits_out_path, 'w') as f:
       f.write(pdb_templates_result)
 
-    uniref90_msa = parsers.parse_stockholm(msa_results['uniref90']['sto'])
-    mgnify_msa = parsers.parse_stockholm(msa_results['mgnify']['sto'])
+    uniref90_msa = parsers.parse_stockholm(jackhmmer_uniref90_result['sto'])
+    mgnify_msa = parsers.parse_stockholm(jackhmmer_mgnify_result['sto'])
 
     pdb_template_hits = self.template_searcher.get_template_hits(
         output_string=pdb_templates_result, input_sequence=input_sequence)
 
     if self._use_small_bfd:
-        bfd_msa = parsers.parse_stockholm(msa_results['small_bfd']['sto'])
+        bfd_msa = parsers.parse_stockholm(bfd_result['sto'])
     else:
-        raise ValueError("Doesn't support full BFD yet.")
+        bfd_msa = parsers.parse_a3m(bfd_result['a3m'])
 
     templates_result = self.template_featurizer.get_templates(
         query_sequence=input_sequence,
diff --git a/apps/protein_folding/helixfold3/helixfold/data/pipeline_token_feature.py b/apps/protein_folding/helixfold3/helixfold/data/pipeline_token_feature.py
index 7bbb1805..92ea0e1e 100644
--- a/apps/protein_folding/helixfold3/helixfold/data/pipeline_token_feature.py
+++ b/apps/protein_folding/helixfold3/helixfold/data/pipeline_token_feature.py
@@ -3,16 +3,16 @@
 import collections
 import os
 import time
-from typing import MutableMapping, Optional, List
+from typing import Any, MutableMapping, Optional, List
 from absl import logging
 from helixfold.common import residue_constants
 from helixfold.data import parsers
+from helixfold.data.pipeline_conf_bonds import load_ccd_dict
 import numpy as np
 import json
-import gzip
-import pickle
+
 from rdkit import Chem
- 
+
 FeatureDict = MutableMapping[str, np.ndarray]
 ELEMENT_MAPPING = Chem.GetPeriodicTable()
  
@@ -56,6 +56,15 @@ def flatten_is_protein_features(is_protein_feats: np.ndarray) -> FeatureDict:
  
   return res
  
+
+def dump_all_ccd_keys(ccd_data: dict[str, Any]):
+  ccd_keys_file='all_ccd_keys.txt'
+  if not os.path.isfile(ccd_keys_file):
+      open(ccd_keys_file, 'w').write('\n'.join(ccd_data.keys()))
+  logging.warning(f'All ccd keys are dumped to {ccd_keys_file}')
+  return ccd_keys_file
+  
+   
 def make_sequence_features(
     all_chain_info, ccd_preprocessed_dict, 
     extra_feats: Optional[dict]=None) -> FeatureDict:
@@ -106,13 +115,18 @@ def make_sequence_features(
     sym_id = chainid_to_sym_id[_alphabet_chain_id]
     for residue_id, ccd_id in enumerate(ccd_seq):
       if ccd_id not in ccd_preprocessed_dict:
-        assert not extra_feats is None and ccd_id in extra_feats,\
-                  f'<{ccd_id}> not in ccd_preprocessed_dict, But got extra_feats is None'
+        if extra_feats is None:
+          ccd_kf=dump_all_ccd_keys(ccd_preprocessed_dict)
+          raise ValueError(f'<{ccd_id}> not in ccd_preprocessed_dict, But got extra_feats is None. See all keys in {ccd_kf}')
+        if ccd_id not in extra_feats:
+          ccd_kf=dump_all_ccd_keys(ccd_preprocessed_dict)
+          raise ValueError(f'<{ccd_id}> not in ccd_preprocessed_dict or extra_feats.  See all keys in {ccd_kf}')
         _ccd_feats = extra_feats[ccd_id]
       else:
         _ccd_feats = ccd_preprocessed_dict[ccd_id]
       num_atoms = len(_ccd_feats['position'])
-      assert num_atoms > 0, f'TODO filter - Got CCD <{ccd_id}>: 0 atom nums.'
+      if num_atoms == 0:
+         raise NotImplementedError(f'TODO filter - Got CCD <{ccd_id}>: 0 atom nums.')
       
       if ccd_id not in residue_constants.STANDARD_LIST: 
           features['asym_id'].append(np.array([chain_num_id] * num_atoms, dtype=np.int32))
@@ -197,13 +211,8 @@ def process(self,
       assembly_dict = unit_dict
 
     if ccd_preprocessed_dict is None:
-      ccd_preprocessed_dict = {}
-      st_1 = time.time()
-      if 'pkl.gz' in self.ccd_preprocessed_path:
-          with gzip.open(self.ccd_preprocessed_path, "rb") as fp:
-              ccd_preprocessed_dict = pickle.load(fp)
-      logging.info(f'load ccd dataset done. use {time.time()-st_1}s')
- 
+      ccd_preprocessed_dict=load_ccd_dict(self.ccd_preprocessed_path)
+
     if select_mmcif_chainID is not None:
       select_mmcif_chainID = set(select_mmcif_chainID)
  
diff --git a/apps/protein_folding/helixfold3/helixfold/data/templates.py b/apps/protein_folding/helixfold3/helixfold/data/templates.py
index f2e3289a..f9efdae0 100644
--- a/apps/protein_folding/helixfold3/helixfold/data/templates.py
+++ b/apps/protein_folding/helixfold3/helixfold/data/templates.py
@@ -817,19 +817,15 @@ def _process_single_hit(
           TemplateAtomMaskAllZerosError) as e:
     # These 3 errors indicate missing mmCIF experimental data rather than a
     # problem with the template search, so turn them into warnings.
-    warning = ('%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: '
-               '%s, mmCIF parsing errors: %s'
-               % (hit_pdb_code, hit_chain_id, hit.sum_probs, hit.index,
-                  str(e), parsing_result.errors))
+    warning = (f'{hit_pdb_code}_{hit_chain_id} (sum_probs: {hit.sum_probs}, rank: {hit.index}): feature extracting errors: '
+               f'{str(e)}, mmCIF parsing errors: {parsing_result.errors}')
     if strict_error_check:
       return SingleHitResult(features=None, error=warning, warning=None)
     else:
       return SingleHitResult(features=None, error=None, warning=warning)
   except Error as e:
-    error = ('%s_%s (sum_probs: %.2f, rank: %d): feature extracting errors: '
-             '%s, mmCIF parsing errors: %s'
-             % (hit_pdb_code, hit_chain_id, hit.sum_probs, hit.index,
-                str(e), parsing_result.errors))
+    error = (f'{hit_pdb_code}_{hit_chain_id} (sum_probs: {hit.sum_probs}, rank: {hit.index}): feature extracting errors: '
+               f'{str(e)}, mmCIF parsing errors: {parsing_result.errors}')
     return SingleHitResult(features=None, error=error, warning=None)
 
 
diff --git a/apps/protein_folding/helixfold3/helixfold/data/tools/utils.py b/apps/protein_folding/helixfold3/helixfold/data/tools/utils.py
index c415dac8..c588187d 100644
--- a/apps/protein_folding/helixfold3/helixfold/data/tools/utils.py
+++ b/apps/protein_folding/helixfold3/helixfold/data/tools/utils.py
@@ -16,7 +16,7 @@
 
 import contextlib
 import shutil
-import logging
+from absl import logging
 import tempfile
 import time
 from typing import Optional
diff --git a/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py b/apps/protein_folding/helixfold3/helixfold/infer_scripts/feature_processing_aa.py
similarity index 87%
rename from apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py
rename to apps/protein_folding/helixfold3/helixfold/infer_scripts/feature_processing_aa.py
index b89f43f4..31272c32 100644
--- a/apps/protein_folding/helixfold3/infer_scripts/feature_processing_aa.py
+++ b/apps/protein_folding/helixfold3/helixfold/infer_scripts/feature_processing_aa.py
@@ -2,38 +2,31 @@
 import collections
 import copy
 import os
-import time, gzip, pickle
+from pathlib import Path
+import pickle
+from typing import List, Mapping, Optional, Tuple
+
 import numpy as np
-import logging
+from absl import logging
+
+
 from helixfold.common import residue_constants
 from helixfold.data import parsers
-from helixfold.data import pipeline_multimer
+from helixfold.data.tools import utils
+from helixfold.data import pipeline_multimer, pipeline_multimer_parallel
 from helixfold.data import pipeline_rna_multimer
 from helixfold.data import pipeline_conf_bonds, pipeline_token_feature, pipeline_hybrid
 from helixfold.data import label_utils
-from concurrent.futures import ProcessPoolExecutor, as_completed
-from .preprocess import digit2alphabet
 
-logger = logging.getLogger(__file__)
+from helixfold.data.tools import utils
+
+from .preprocess import Entity, digit2alphabet
+
 
 POLYMER_STANDARD_RESI_ATOMS = residue_constants.residue_atoms
 STRING_FEATURES = ['all_chain_ids', 'all_ccd_ids','all_atom_ids', 
                   'release_date','label_ccd_ids','label_atom_ids']
 
-def load_ccd_dict(ccd_preprocessed_path):
-    assert os.path.exists(ccd_preprocessed_path),\
-              (f'[CCD] ccd_preprocessed_path: {ccd_preprocessed_path} not exist.')
-    st_1 = time.time()
-    if 'pkl.gz' in ccd_preprocessed_path:
-        with gzip.open(ccd_preprocessed_path, "rb") as fp:
-            ccd_preprocessed_dict = pickle.load(fp)
-    elif '.pkl' in ccd_preprocessed_path:
-        with open(ccd_preprocessed_path, "rb") as fp:
-            ccd_preprocessed_dict = pickle.load(fp)
-    print(f'[CCD] load ccd dataset done. use {time.time()-st_1}s;'\
-                    f'Has length of {len(ccd_preprocessed_dict)}')
-    
-    return ccd_preprocessed_dict
 
 
 def crop_msa(feat, max_msa_depth=16384):
@@ -233,7 +226,7 @@ def get_inference_restype_mask(all_chain_features, ccd_preprocessed_dict, extra_
   }
 
 
-def add_assembly_features(all_chain_features, ccd_preprocessed_dict, no_msa_templ_feats=True):
+def add_assembly_features(all_chain_features: Mapping, ccd_preprocessed_dict: Mapping, no_msa_templ_feats:bool=True, covalent_bonds:Optional[List[pipeline_conf_bonds.CovalentBond]]=None):
   '''
     ## NOTE: keep the type and chainID orders.
     all_chain_features: {
@@ -307,7 +300,7 @@ def add_assembly_features(all_chain_features, ccd_preprocessed_dict, no_msa_temp
   ref_features = pipeline_conf_bonds.make_ccd_conf_features(all_chain_info=new_order_chain_infos,
                                                       ccd_preprocessed_dict=ccd_preprocessed_dict,
                                                       extra_feats=extra_feats_infos)
-  bond_features = pipeline_conf_bonds.make_bond_features(covalent_bond=[], 
+  bond_features = pipeline_conf_bonds.make_bond_features(covalent_bond=covalent_bonds, 
                                                       all_chain_info=new_order_chain_infos, 
                                                       ccd_preprocessed_dict=ccd_preprocessed_dict,
                                                       extra_feats=extra_feats_infos)
@@ -330,7 +323,7 @@ def add_assembly_features(all_chain_features, ccd_preprocessed_dict, no_msa_temp
           "label": label,}
 
 
-def process_chain_msa(args):
+def process_chain_msa(args: tuple[pipeline_multimer_parallel.DataPipeline, str, Optional[str],Optional[str], os.PathLike,os.PathLike ]) -> Tuple[str,dict, str, str]:
     """
     处理链，如果缓存了特征文件，则直接使用缓存的特征文件，否则生成新的特征文件。
     
@@ -356,16 +349,16 @@ def process_chain_msa(args):
     data_pipeline, chain_id, seq, desc, \
     msa_output_dir, features_pkl = args
     if features_pkl.exists():
-        logger.info('Use cached features.pkl')
+        logging.info('Use cached features.pkl')
         with open(features_pkl, 'rb') as f:
             raw_features = pickle.load(f)
     else:
-        t0 = time.time()
-        raw_features = data_pipeline._process_single_chain(
-            chain_id, sequence=seq, description=desc,
-            msa_output_dir=msa_output_dir,
-            is_homomer_or_monomer=False)
-        print(f'[MSA/Template] {desc}; seq length: {len(seq)}; use: {time.time() - t0}')
+        with utils.timing(f'[MSA/Template]({desc}) with seq length: {len(seq)}'):
+          raw_features = data_pipeline._process_single_chain(
+              chain_id, sequence=seq, description=desc,
+              msa_output_dir=msa_output_dir,
+              is_homomer_or_monomer=False)
+       
 
         with open(features_pkl, 'wb') as f:
             pickle.dump(raw_features, f, protocol=4)
@@ -376,40 +369,36 @@ def process_chain_msa(args):
     return chain_id, raw_features, desc, seq
 
 
-def process_input_json(all_entitys, ccd_preprocessed_path, 
+def process_input_json(all_entitys: List[Entity], ccd_preprocessed_path, 
                           msa_templ_data_pipeline_dict, msa_output_dir,
                           no_msa_templ_feats=False):
 
     ## load ccd dict.
-    ccd_preprocessed_dict = load_ccd_dict(ccd_preprocessed_path)
+    ccd_preprocessed_dict = pipeline_conf_bonds.load_ccd_dict(ccd_preprocessed_path)
     all_chain_features = {}
     sequence_features = {} 
     num_chains = 0
-    for entity_items in all_entitys:
+    for entity in all_entitys:
+      if (dtype:=entity.dtype) not in residue_constants.CHAIN_type_order:
+        continue
       # dtype(protein, dna, rna, ligand): no_chains,  msa_seqs, seqs
-      dtype = list(entity_items.keys())[0]
-      items = list(entity_items.values())[0]
-      entity_count = items['count']
-      ccd_seqs = items['seqs']
-      msa_seqs = items['msa_seqs']
-      extra_mol_infos = items.get('extra_mol_infos', {}) ## dict, 「extra-add, ccd_id」: ccd_features.
-
-      for i in range(entity_count):
+
+      for i in range(entity.count):
         chain_num_ids = num_chains + i
         chain_id = digit2alphabet(chain_num_ids) # increase ++
         type_chain_id = dtype + '_' + chain_id
-        if ccd_seqs in sequence_features:
-          all_chain_features[type_chain_id] = copy.deepcopy(sequence_features[ccd_seqs])
+        if entity.seqs in sequence_features:
+          all_chain_features[type_chain_id] = copy.deepcopy(sequence_features[entity.seqs])
           continue
         
-        ccd_list = parsers.parse_ccd_fasta(ccd_seqs)
+        ccd_list = parsers.parse_ccd_fasta(entity.seqs)
         chain_features = {'msa_templ_feats': {},
                           'ccd_seqs': ccd_list, 
-                          'msa_seqs': msa_seqs,
-                          'extra_feats': extra_mol_infos}
+                          'msa_seqs': entity.msa_seqs,
+                          'extra_feats': entity.extra_mol_infos}
         all_chain_features[type_chain_id] = chain_features
-        sequence_features[ccd_seqs] = chain_features
-      num_chains += entity_count
+        sequence_features[entity.seqs] = chain_features
+      num_chains += entity.count
 
     if not no_msa_templ_feats:
       ## 1. get all msa_seqs for protein/rna MSA/Template search. Only for protein/rna.
@@ -450,19 +439,10 @@ def process_input_json(all_entitys, ccd_preprocessed_path,
 
       ## 2. multiprocessing for protein/rna MSA/Template search.
       seqs_to_msa_features = {}
-      logger.info('[Multiprocess] starting MSA/Template search...')
-      t0 = time.time()
-      with ProcessPoolExecutor() as executor:
-          futures = [executor.submit(process_chain_msa, task) for task in tasks]
-
-          for future in as_completed(futures):
-              try:
-                  _, raw_features, type_chain_id, seqs = future.result()
-                  seqs_to_msa_features[seqs] = raw_features
-              except Exception as exc:
-                  import traceback; traceback.print_exc()
-                  logger.error(f'Task generated an exception : {exc}')
-      logger.info(f'[Multiprocess] All msa/template use: {time.time() - t0}')
+      with utils.timing('MSA/Template search'):
+        for task in tasks:
+          _, raw_features, type_chain_id, seqs=process_chain_msa(task)
+          seqs_to_msa_features[seqs] = raw_features
 
       ## 3. add msa_templ_feats to all_chain_features.
       for type_chain_id in all_chain_features.keys():
@@ -472,8 +452,12 @@ def process_input_json(all_entitys, ccd_preprocessed_path,
           for _type_chain_id in fasta_seq_to_type_chain_id[fasta_seq]:
             chain_features['msa_templ_feats'] = copy.deepcopy(seqs_to_msa_features[fasta_seq])
 
+
+    # gather all defined covalent bonds
+    all_covalent_bonds=[bond for entity in all_entitys for bond in entity.msa_seqs if entity.dtype == 'bond']
+
     assert num_chains == len(all_chain_features.keys())
-    all_feats = add_assembly_features(all_chain_features, ccd_preprocessed_dict, no_msa_templ_feats)
+    all_feats = add_assembly_features(all_chain_features, ccd_preprocessed_dict, no_msa_templ_feats, all_covalent_bonds)
     np_example, label = all_feats['feats'], all_feats['label']
     assert num_chains == len(np.unique(np_example['all_chain_ids']))
 
diff --git a/apps/protein_folding/helixfold3/infer_scripts/preprocess.py b/apps/protein_folding/helixfold3/helixfold/infer_scripts/preprocess.py
similarity index 60%
rename from apps/protein_folding/helixfold3/infer_scripts/preprocess.py
rename to apps/protein_folding/helixfold3/helixfold/infer_scripts/preprocess.py
index 41cd44ac..2fffbb96 100644
--- a/apps/protein_folding/helixfold3/infer_scripts/preprocess.py
+++ b/apps/protein_folding/helixfold3/helixfold/infer_scripts/preprocess.py
@@ -5,21 +5,28 @@
         'seqs': ccd_seqs,
         'msa_seqs': msa_seqs,
         'count': count,
-        'extra_mol_infos': {}， for which seqs has the modify residue type or smiles.
+        'extra_mol_infos': {}, for which seqs has the modify residue type or smiles.
 """
 import collections
 import copy
+import gzip
 import os
 import json
 import sys
 import subprocess
 import tempfile
 import itertools
-sys.path.append('../')
+from absl import logging
+from typing import List, Optional, Tuple, Union, Mapping, Literal, Callable, Any
+from dataclasses import dataclass, field
 import rdkit
 from rdkit import Chem
 from rdkit.Chem import AllChem
 from helixfold.common import residue_constants
+from helixfold.data.pipeline_conf_bonds import CovalentBond, parse_covalent_bond_input
+from helixfold.data.tools import utils
+
+from openbabel import openbabel
 
 
 ## NOTE: this mapping is only useful for standard dna/rna/protein sequence input.
@@ -52,9 +59,7 @@
     3: 'Unknown error.'
 }
 
-OBABEL_BIN = os.getenv('OBABEL_BIN')
-if not os.path.exists(OBABEL_BIN):
-    raise FileNotFoundError(f'Cannot find obabel binary at {OBABEL_BIN}.')
+
 
 
 def read_json(path):
@@ -140,22 +145,76 @@ def smiles_to_ETKDGMol(smiles):
     return optimal_mol_wo_H
 
 
-def smiles_toMol_obabel(smiles):
-    """
-        generate mol from smiles using obabel;
-    """    
-    with tempfile.NamedTemporaryFile(suffix=".mol2") as temp_file:
-        print(f"[OBABEL] Temporary file created: {temp_file.name}")
-        obabel_cmd = f"{OBABEL_BIN} -:'{smiles}' -omol2 -O{temp_file.name} --gen3d"
+class Mol2MolObabel:
+    def __init__(self):
+        self.obabel_bin = os.getenv('OBABEL_BIN')
+        if not (self.obabel_bin and os.path.isfile(self.obabel_bin)):
+            raise FileNotFoundError(f'Cannot find obabel binary at {self.obabel_bin}.')
+        
+        # Get the supported formats
+        self.supported_formats: Tuple[str] = self._get_supported_formats()
+
+    def _get_supported_formats(self) -> Tuple[str]:
+        """
+        Retrieves the list of supported formats from obabel and filters out write-only formats.
+        
+        Returns:
+            tuple: A tuple of supported input formats.
+        """
+        obabel_cmd = f"{self.obabel_bin} -L formats"
         ret = subprocess.run(obabel_cmd, shell=True, capture_output=True, text=True)
-        mol = Chem.MolFromMol2File(temp_file.name, sanitize=False)
-        if '3D coordinate generation failed' in ret.stderr:
+        formats = [line.split()[0] for line in ret.stdout.splitlines() if '[Write-only]' not in line]
+        formats.append('smiles')
+        
+        return tuple(formats)
+    
+    def _load_mol(self, mol2_file:str, ret:Optional[subprocess.CompletedProcess]=None) -> Chem.Mol:
+        mol = Chem.MolFromMol2File(mol2_file, sanitize=False)
+        if isinstance(ret, subprocess.CompletedProcess) and '3D coordinate generation failed' in ret.stderr:
             mol = generate_ETKDGv3_conformer(mol)
         optimal_mol_wo_H = Chem.RemoveAllHs(mol, sanitize=False)
-    return optimal_mol_wo_H
 
+        return optimal_mol_wo_H
 
-def polymer_convert(items):
+    def _perform_conversion(self, input_type: str, input_value: str, generate_3d: bool=True) -> Chem.Mol:
+        if input_type == 'mol2' and input_value.endswith('.mol2'):
+            return self._load_mol(mol2_file=input_value)
+        
+        save_path=os.path.join('ligands',f'{os.path.basename(input_value)[:-(len(input_type)+1)] if input_type != "smiles" else "UNK"}.mol2')
+
+        os.makedirs(os.path.dirname(save_path), exist_ok=True)
+        
+        with utils.timing(f'converting {input_type} to mol2: {input_value}'):
+            if input_type == 'smiles':
+                obabel_cmd = f"{self.obabel_bin} -:'{input_value}' -omol2 -O{save_path} {'--gen3d' if generate_3d else ''}"
+                if len(input_value)>60:
+                    logging.warning(f'This takes a while ...')
+            else:
+                obabel_cmd = f"{self.obabel_bin} -i {input_type} {input_value} -omol2 -O{save_path} {'--gen3d' if generate_3d else ''}"
+            logging.debug(f'Launching command: `{obabel_cmd}`')
+            ret = subprocess.run(obabel_cmd, shell=True, capture_output=True, text=True)
+            return self._load_mol(mol2_file=save_path, ret=ret)
+            
+    def _convert_to_mol(self, input_type: str, input_value: str, generate_3d: bool=True) -> Chem.Mol:
+        if input_type not in self.supported_formats:
+            raise ValueError(f'Unsupported small molecule input: {input_type}. \nSupported formats: \n{self.supported_formats}\n')
+
+        if input_type != 'smiles' and not os.path.isfile(input_value):
+            raise FileNotFoundError(f'Cannot find the {input_type.upper()} file at {input_value}.')
+        
+        return self._perform_conversion(input_type, input_value, generate_3d)
+
+    __call__: Callable[[str, str, bool], Chem.Mol] = _convert_to_mol
+
+@dataclass
+class Entity:
+    dtype: Literal['protein', 'dna', 'rna', 'ligand', 'bond','non_polymer', 'ion']
+    seqs: str
+    msa_seqs: Union[str, List[CovalentBond]] = ''
+    count: int = 1
+    extra_mol_infos: dict[str, Any] = field(default_factory=dict)
+
+def polymer_convert(items)-> Entity:
     """
         "type": "protein",                          
         "sequence": "GPDSMEEVVVPEEPPKLVSALATYVQQERLCTMFLSIANKLLPLKP",  
@@ -178,18 +237,21 @@ def polymer_convert(items):
             raise ValueError(f'not support for the {dtype} in polymer_convert')
     ccd_seqs = ''.join(ccd_seqs) ## (GLY)(ALA).....
 
-    # repeat_ccds, repeat_fasta = [ccd_seqs], [msa_seqs]
-    return {
-        dtype: {
-            'seqs': ccd_seqs,
-            'msa_seqs': msa_seqs,
-            'count': count,
-            'extra_mol_infos': {}
-        }
-    }
+    return Entity(dtype=dtype, seqs=ccd_seqs, msa_seqs=msa_seqs,count=count)
+
+
+def covalent_bond_convert(items: Mapping[str, Union[int, str]]) -> Entity:
+    """
+        "type": "bond",
+        "bond": "A,ASN,74,ND2,B,UNK-,"
+    """
+    dtype = items['type']
+    bond = parse_covalent_bond_input(items['bond'])
+
+    return Entity(dtype=dtype, seqs='', msa_seqs=bond)
 
 
-def ligand_convert(items):
+def ligand_convert(items: Mapping[str, Union[int, str]]) -> Entity:
     """
         "type": "ligand",
         "ccd": "ATP", or "smiles": "CCccc(O)ccc",
@@ -197,33 +259,36 @@ def ligand_convert(items):
     """
     dtype = items['type']
     count = items['count']
+    converter=Mol2MolObabel()
     
     msa_seqs = ""
     _ccd_seqs = []
     ccd_to_extra_mol_infos = {}
     if 'ccd' in items:
         _ccd_seqs.append(f"({items['ccd']})")
-    elif 'smiles' in items:
-        _ccd_seqs.append(f"(UNK-)")
+
+    
+    elif any(f in items for f in converter.supported_formats):
+        for k in converter.supported_formats:
+            if k in items:
+                break
+
+        ligand_name="UNK-"
+        _ccd_seqs.append(f"({ligand_name})")
         # mol_wo_h = smiles_to_ETKDGMol(items['smiles'])
-        mol_wo_h = smiles_toMol_obabel(items['smiles'])
+        
+        mol_wo_h = converter(k, items[k], items.get('use_3d', True))
         _extra_mol_infos = make_basic_info_fromMol(mol_wo_h)
         ccd_to_extra_mol_infos = {
-            "UNK-": _extra_mol_infos
+            ligand_name: _extra_mol_infos
         }
     else:
-        raise ValueError(f'not support for the {dtype} in ligand_convert')
+        raise ValueError(f'not support for the {dtype} in ligand_convert, please check the input. \nSupported input: {converter.supported_formats}')
     ccd_seqs = ''.join(_ccd_seqs) ## (GLY)(ALA).....
 
     # repeat_ccds, repeat_fasta = [ccd_seqs], [msa_seqs]
-    return {
-        'ligand': {
-            'seqs': ccd_seqs,
-            'msa_seqs': msa_seqs,
-            'count': count,
-            'extra_mol_infos': ccd_to_extra_mol_infos,
-        }
-    }
+    return Entity(dtype='ligand', seqs=ccd_seqs,msa_seqs=msa_seqs,count=count,extra_mol_infos=ccd_to_extra_mol_infos)
+
 
 
 def entities_rename_and_filter(items):
@@ -231,39 +296,34 @@ def entities_rename_and_filter(items):
         'ion': 'ligand'
     }
     items['type'] = ligand_mapping.get(items['type'], items['type'])
-    if items['type'] not in ALLOWED_ENTITY_TYPE:
+    if items['type'] not in ALLOWED_ENTITY_TYPE and items['type'] != 'bond':
         raise ValueError(f'{items["type"]} is not allowed, will be ignored.')
     return items
 
 
-def modify_name_convert(entities: list):
+def modify_name_convert(entities: list[Entity]):
     cur_idx = 0
-    for entity_items in entities:
+    for entity in entities:
         # dtype(protein, dna, rna, ligand): no_chains,  msa_seqs, seqs
-        dtype = list(entity_items.keys())[0]
-        items = list(entity_items.values())[0]
-        entity_count = items['count']
-        msa_seqs = items['msa_seqs']
-        extra_mol_infos = items.get('extra_mol_infos', {}) ## dict, 「extra-add, ccd_id」: ccd_features.
 
-        extra_ccd_ids = list(extra_mol_infos.keys())
         ## rename UNK- to UNK-1, 2, 3, 4...
-        for k in extra_ccd_ids:
+        extra_mol_infos=copy.deepcopy(entity.extra_mol_infos)
+        for k in extra_mol_infos:
             user_name_3 = USER_LIG_IDS_3[cur_idx]
-            items['seqs'] = items['seqs'].replace('UNK-', user_name_3)
-            extra_mol_infos[user_name_3] = extra_mol_infos.pop('UNK-')
+            entity.seqs = entity.seqs.replace('UNK-', user_name_3)
+            entity.extra_mol_infos[user_name_3] = entity.extra_mol_infos.pop('UNK-')
             cur_idx += 1
 
     return entities
 
 
-def online_json_to_entity(json_path, out_dir):
+def online_json_to_entity(json_path: str, out_dir: str)-> list[Entity]:
     obj = read_json(json_path)
     entities = copy.deepcopy(obj['entities'])
 
     os.makedirs(out_dir, exist_ok=True)
     error_ids = []
-    success_entity = []
+    success_entity: list[Entity] = []
     for idx, items in enumerate(entities):
         try: 
             items = entities_rename_and_filter(items)
@@ -275,6 +335,8 @@ def online_json_to_entity(json_path, out_dir):
         try:
             if items['type'] == 'ligand':
                 json_obj = ligand_convert(items)
+            elif items['type'] == 'bond':
+                json_obj = covalent_bond_convert(items)
             else:
                 json_obj = polymer_convert(items)
             success_entity.append(json_obj)
@@ -289,5 +351,4 @@ def online_json_to_entity(json_path, out_dir):
     if len(error_ids) > 0:
         raise RuntimeError(f'[Error] Failed to convert {len(error_ids)}/{len(entities)} entities')    
     
-    success_entity = modify_name_convert(success_entity)
-    return success_entity
\ No newline at end of file
+    return modify_name_convert(success_entity)
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/infer_scripts/tools/mmcif_writer.py b/apps/protein_folding/helixfold3/helixfold/infer_scripts/tools/mmcif_writer.py
similarity index 100%
rename from apps/protein_folding/helixfold3/infer_scripts/tools/mmcif_writer.py
rename to apps/protein_folding/helixfold3/helixfold/infer_scripts/tools/mmcif_writer.py
diff --git a/apps/protein_folding/helixfold3/inference.py b/apps/protein_folding/helixfold3/helixfold/inference.py
similarity index 63%
rename from apps/protein_folding/helixfold3/inference.py
rename to apps/protein_folding/helixfold3/helixfold/inference.py
index 51cf6ec6..57fd692c 100644
--- a/apps/protein_folding/helixfold3/inference.py
+++ b/apps/protein_folding/helixfold3/helixfold/inference.py
@@ -16,30 +16,38 @@
 import re
 import os
 import copy
-import argparse
 import random
 import paddle
 import json
 import pickle
 import pathlib
 import shutil
-import logging
 import numpy as np
+import shutil
+
+from absl import logging
+
+from omegaconf import DictConfig
+import hydra
+
 from helixfold.common import all_atom_pdb_save
+from helixfold.data.pipeline_conf_bonds import load_ccd_dict
 from helixfold.model import config, utils
 from helixfold.data import pipeline_parallel as pipeline
 from helixfold.data import pipeline_multimer_parallel as pipeline_multimer
 from helixfold.data import pipeline_rna_parallel as pipeline_rna
 from helixfold.data import pipeline_rna_multimer
 from helixfold.data.utils import atom_level_keys, map_to_continuous_indices
+from helixfold.utils.model import RunModel
 from helixfold.data.tools import hmmsearch
 from helixfold.data import templates
-from utils.utils import get_custom_amp_list
-from utils.model import RunModel
-from utils.misc import set_logging_level
+from helixfold.utils.utils import get_custom_amp_list
 from typing import Dict
-from infer_scripts import feature_processing_aa, preprocess
-from infer_scripts.tools import mmcif_writer
+from helixfold.infer_scripts import feature_processing_aa, preprocess
+from helixfold.infer_scripts.tools import mmcif_writer
+
+
+script_path=os.path.dirname(__file__)
 
 ALLOWED_LIGAND_BONDS_TYPE_MAP = preprocess.ALLOWED_LIGAND_BONDS_TYPE_MAP
 INVERSE_ALLOWED_LIGAND_BONDS_TYPE_MAP = {
@@ -61,7 +69,6 @@
 
 RETURN_KEYS = ['diffusion_module', 'confidence_head']
 
-logger = logging.getLogger(__file__)
 
 MAX_TEMPLATE_HITS = 4
 
@@ -88,62 +95,75 @@ def preprocess_json_entity(json_path, out_dir):
     if all_entitys is None:
         raise ValueError("The json file does not contain any valid entity.")
     else:
-        logger.info("The json file contains %d valid entity.", len(all_entitys))
+        logging.info("The json file contains %d valid entity.", len(all_entitys))
     
     return all_entitys
 
 def convert_to_json_compatible(obj):
     if isinstance(obj, np.ndarray):
         return obj.tolist()
-    elif isinstance(obj, np.integer):
+    if isinstance(obj, np.integer):
         return int(obj)
-    elif isinstance(obj, np.floating):
+    if isinstance(obj, np.floating):
         return float(obj)
-    elif isinstance(obj, dict):
+    if isinstance(obj, dict):
         return {k: convert_to_json_compatible(v) for k, v in obj.items()}
-    elif isinstance(obj, list):
+    if isinstance(obj, list):
         return [convert_to_json_compatible(i) for i in obj]
-    else:
-        return obj
-    
-def get_msa_templates_pipeline(args) -> Dict:
-    use_precomputed_msas = True # FLAGS.use_precomputed_msas
+    return obj
+
+def resolve_bin_path(cfg_path: str, default_binary_name: str)-> str:
+    """Helper function to resolve the binary path."""
+    if cfg_path and os.path.isfile(cfg_path):
+        return cfg_path
+
+    if cfg_val:=shutil.which(default_binary_name):
+        logging.warning(f'Using resolved {default_binary_name}: {cfg_val}')
+        return cfg_val
+
+    raise FileNotFoundError(f"Could not find a proper binary path for {default_binary_name}: {cfg_path}.")
+
+def get_msa_templates_pipeline(cfg: DictConfig) -> Dict:
+    use_precomputed_msas = True  # Assuming this is a constant or should be set globally
+
     template_searcher = hmmsearch.Hmmsearch(
-        binary_path=args.hmmsearch_binary_path,
-        hmmbuild_binary_path=args.hmmbuild_binary_path,
-        database_path=args.pdb_seqres_database_path)
+        binary_path=resolve_bin_path(cfg.bin.hmmsearch, 'hmmsearch'),
+        hmmbuild_binary_path=resolve_bin_path(cfg.bin.hmmbuild, 'hmmbuild'),
+        database_path=cfg.db.pdb_seqres)
 
     template_featurizer = templates.HmmsearchHitFeaturizer(
-        mmcif_dir=args.template_mmcif_dir,
-        max_template_date=args.max_template_date,
+        mmcif_dir=cfg.template.mmcif_dir,
+        max_template_date=cfg.template.max_date,
         max_hits=MAX_TEMPLATE_HITS,
-        kalign_binary_path=args.kalign_binary_path,
+        kalign_binary_path=resolve_bin_path(cfg.bin.kalign, 'kalign'),
         release_dates_path=None,
-        obsolete_pdbs_path=args.obsolete_pdbs_path)
+        obsolete_pdbs_path=cfg.template.obsolete_pdbs)
 
     monomer_data_pipeline = pipeline.DataPipeline(
-        jackhmmer_binary_path=args.jackhmmer_binary_path,
-        hhblits_binary_path=args.hhblits_binary_path,
-        hhsearch_binary_path=args.hhsearch_binary_path,
-        uniref90_database_path=args.uniref90_database_path,
-        mgnify_database_path=args.mgnify_database_path,
-        bfd_database_path=args.bfd_database_path,
-        uniclust30_database_path=args.uniclust30_database_path,
-        small_bfd_database_path=args.small_bfd_database_path ,
+        jackhmmer_binary_path=resolve_bin_path(cfg.bin.jackhmmer, 'jackhmmer'),
+        hhblits_binary_path=resolve_bin_path(cfg.bin.hhblits, 'hhblits'),
+        hhsearch_binary_path=resolve_bin_path(cfg.bin.hhsearch, 'hhsearch'),
+        uniref90_database_path=cfg.db.uniref90,
+        mgnify_database_path=cfg.db.mgnify,
+        bfd_database_path=cfg.db.bfd,
+        uniclust30_database_path=cfg.db.uniclust30,
+        small_bfd_database_path=cfg.db.small_bfd,
         template_searcher=template_searcher,
         template_featurizer=template_featurizer,
-        use_small_bfd=args.use_small_bfd,
-        use_precomputed_msas=use_precomputed_msas)
+        use_small_bfd=cfg.use_small_bfd,
+        use_precomputed_msas=use_precomputed_msas,
+        nprocs=cfg.nproc_msa,
+        )
 
     prot_data_pipeline = pipeline_multimer.DataPipeline(
         monomer_data_pipeline=monomer_data_pipeline,
-        jackhmmer_binary_path=args.jackhmmer_binary_path,
-        uniprot_database_path=args.uniprot_database_path,
+        jackhmmer_binary_path=resolve_bin_path(cfg.bin.jackhmmer, 'jackhmmer'),
+        uniprot_database_path=cfg.db.uniprot,
         use_precomputed_msas=use_precomputed_msas)
 
     rna_monomer_data_pipeline = pipeline_rna.RNADataPipeline(
-      hmmer_binary_path=args.nhmmer_binary_path,
-      rfam_database_path=args.rfam_database_path,
+      hmmer_binary_path=resolve_bin_path(cfg.bin.nhmmer, 'nhmmer'),
+      rfam_database_path=cfg.db.rfam,
       rnacentral_database_path=None,
       nt_database_path=None,     
       species_identifer_map_path=None,
@@ -156,7 +176,6 @@ def get_msa_templates_pipeline(args) -> Dict:
         'protein': prot_data_pipeline,
         'rna': rna_data_pipeline
     }
-
 def ranking_all_predictions(output_dirs):
     ranking_score_path_map = {}
     for outpath in output_dirs:
@@ -167,7 +186,7 @@ def ranking_all_predictions(output_dirs):
     ranked_map = dict(sorted(ranking_score_path_map.items(), key=lambda x: x[1], reverse=True))
     rank_id = 1
     for outpath, rank_score in ranked_map.items():
-        logger.debug("[ranking_all_predictions] Ranking score of %s: %.5f", outpath, rank_score)
+        logging.debug("[ranking_all_predictions] Ranking score of %s: %.5f", outpath, rank_score)
         basename_prefix = os.path.basename(outpath).split('-pred-')[0]
         target_path = os.path.join(os.path.dirname(outpath), f'{basename_prefix}-rank{rank_id}')
         if os.path.exists(target_path) and os.path.isdir(target_path):
@@ -196,7 +215,7 @@ def _forward_with_precision(batch):
             raise ValueError("Please choose precision from bf16 and fp32! ")
         
     res = _forward_with_precision(batch)
-    logger.info(f"Inference Succeeds...\n")
+    logging.info(f"Inference Succeeds...\n")
     return res
 
 
@@ -430,95 +449,116 @@ def split_prediction(pred, rank):
     return prediction
 
 
-def main(args):
-    set_logging_level(args.logging_level)
+@hydra.main(version_base=None, config_path=os.path.join(script_path,'config',),config_name='helixfold')
+def main(cfg: DictConfig):
+    logging.set_verbosity(cfg.logging_level)
+
+    if cfg.msa_only == True:
+        logging.warning(f'Model inference will be skipped because MSA-only mode is required.')
+        logging.warning(f'Use CPU only')
+        paddle.device.set_device("cpu")
+        
 
     """main function"""
     new_einsum = os.getenv("FLAGS_new_einsum", True)
     print(f'>>> PaddlePaddle commit: {paddle.version.commit}')
     print(f'>>> FLAGS_new_einsum: {new_einsum}')
-    print(f'>>> args:\n{args}')
+    print(f'>>> config:\n{cfg}')
 
-    all_entitys = preprocess_json_entity(args.input_json, args.output_dir)
     ## check maxit binary path
-    if args.maxit_binary is not None:
-        assert os.path.exists(args.maxit_binary), \
-            f"The maxit binary path {args.maxit_binary} does not exists."
+    maxit_binary=resolve_bin_path(cfg.other.maxit_binary,'maxit')
+    
+    RCSBROOT=os.path.join(os.path.dirname(maxit_binary), '..')
+    os.environ['RCSBROOT']=RCSBROOT
 
+    ## check obabel
+    obabel_bin=resolve_bin_path(cfg.bin.obabel,'obabel')
+    os.environ['OBABEL_BIN']=obabel_bin
 
-    ### set seed for reproduce experiment results
-    seed = args.seed
+    all_entitys = preprocess_json_entity(cfg.input, cfg.output)
+    
+    ### Set seed for reproducibility
+    seed = cfg.seed
     if seed is None:
         seed = np.random.randint(10000000)
     else:
-        logger.warning('seed is only used for reproduction')
+        logging.warning('Seed is only used for reproduction')
     init_seed(seed)
 
-
-    use_small_bfd = args.preset == 'reduced_dbs'
-    setattr(args, 'use_small_bfd', use_small_bfd)
+    use_small_bfd = cfg.preset.preset == 'reduced_dbs'
+    setattr(cfg, 'use_small_bfd', use_small_bfd)
     if use_small_bfd:
-        assert args.small_bfd_database_path is not None
+        assert cfg.db.small_bfd is not None
     else:
-        assert args.bfd_database_path is not None
-        assert args.uniclust30_database_path is not None
+        assert cfg.db.bfd is not None
+        assert cfg.db.uniclust30 is not None
 
-    logger.info('Getting MSA/Template Pipelines...')
-    msa_templ_data_pipeline_dict = get_msa_templates_pipeline(args)
+    logging.info('Getting MSA/Template Pipelines...')
+    msa_templ_data_pipeline_dict = get_msa_templates_pipeline(cfg=cfg)
         
-
-    ### create model
-    model_config = config.model_config(args.model_name)
-    print(f'>>> model_config:\n{model_config}')
+    ### Create model
+    model_config = config.model_config(cfg.CONFIG_DIFFS)
+    logging.warning(f'>>> Model config: \n{model_config}\n\n')
 
     model = RunModel(model_config)
 
-    if (not args.init_model is None) and (not args.init_model == ""):
-        print(f"Load pretrain model from {args.init_model}")
-        pd_params = paddle.load(args.init_model)
+    if (not cfg.weight_path is None) and (cfg.weight_path != ""):
+        print(f"Load pretrain model from {cfg.weight_path}")
+        pd_params = paddle.load(cfg.weight_path)
         
         has_opt = 'optimizer' in pd_params
         if has_opt:
             model.helixfold.set_state_dict(pd_params['model'])
         else:
             model.helixfold.set_state_dict(pd_params)
+
     
-    if args.precision == "bf16" and args.amp_level == "O2":
+    
+    if cfg.precision == "bf16" and cfg.amp_level == "O2":
         raise NotImplementedError("bf16 O2 is not supported yet.")
 
     print(f"============ Data Loading ============")
-    job_base = pathlib.Path(args.input_json).stem
-    output_dir_base = pathlib.Path(args.output_dir).joinpath(job_base)
+    job_base = pathlib.Path(cfg.input).stem
+    output_dir_base = pathlib.Path(cfg.output).joinpath(job_base)
     msa_output_dir = output_dir_base.joinpath('msas')
     msa_output_dir.mkdir(parents=True, exist_ok=True)
 
     features_pkl = output_dir_base.joinpath('final_features.pkl')
-    feature_dict = feature_processing_aa.process_input_json(
-                    all_entitys, 
-                    ccd_preprocessed_path=args.ccd_preprocessed_path,
-                    msa_templ_data_pipeline_dict=msa_templ_data_pipeline_dict,
-                    msa_output_dir=msa_output_dir)
+    if features_pkl.exists() and not cfg.override:
+        with open(features_pkl, 'rb') as f:
+            logging.info(f'Load features from precomputed {features_pkl}')
+            feature_dict = pickle.load(f)
+    else:
+        feature_dict = feature_processing_aa.process_input_json(
+                        all_entitys, 
+                        ccd_preprocessed_path=cfg.db.ccd_preprocessed,
+                        msa_templ_data_pipeline_dict=msa_templ_data_pipeline_dict,
+                        msa_output_dir=msa_output_dir)
 
-    # save features
-    with open(features_pkl, 'wb') as f:
-        pickle.dump(feature_dict, f, protocol=4)
+        # save features
+        with open(features_pkl, 'wb') as f:
+            pickle.dump(feature_dict, f, protocol=4)
+
+    if cfg.msa_only == True:
+        logging.warning(f'Model inference is skipped because MSA-only mode is required.')
+        exit()
 
     feature_dict['feat'] = batch_convert(feature_dict['feat'], add_batch=True)
     feature_dict['label'] = batch_convert(feature_dict['label'], add_batch=True)
     
     print(f"============ Start Inference ============")
     
-    infer_times = args.infer_times
-    if args.diff_batch_size > 0:
-        model_config.model.heads.diffusion_module.test_diff_batch_size = args.diff_batch_size
+    infer_times = cfg.infer_times
+    if cfg.diff_batch_size > 0:
+        model_config.model.heads.diffusion_module.test_diff_batch_size = cfg.diff_batch_size
     diff_batch_size = model_config.model.heads.diffusion_module.test_diff_batch_size 
-    logger.info(f'Inference {infer_times} Times...')
-    logger.info(f" diffusion batch size {diff_batch_size}...\n")
+    logging.info(f'Inference {infer_times} Times...')
+    logging.info(f"Diffusion batch size {diff_batch_size}...\n")
     all_pred_path = []
     for infer_id in range(infer_times):
         
-        logger.info(f'Start {infer_id}-th inference...\n')
-        prediction = eval(args, model, feature_dict)
+        logging.info(f'Start {infer_id}-th inference...\n')
+        prediction = eval(cfg, model, feature_dict)
         
         # save result
         prediction = split_prediction(prediction, diff_batch_size)
@@ -530,7 +570,7 @@ def main(args):
                         feature_dict=feature_dict,
                         prediction=prediction[rank_id],
                         output_dir=output_dir, 
-                        maxit_bin=args.maxit_binary)
+                        maxit_bin=cfg.other.maxit_binary)
             all_pred_path.append(output_dir)
     
     # final ranking
@@ -539,99 +579,23 @@ def main(args):
     print(f'============ Inference finished ! ============')
 
 
-if __name__ == '__main__':
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--bf16_infer", action='store_true', default=False)
-    parser.add_argument("--seed", type=int, default=None, help="set seed for reproduce experiment results, None is do not set seed")
-    parser.add_argument("--logging_level", type=str, default="DEBUG", help="NOTSET, DEBUG, INFO, WARNING, ERROR, CRITICAL")
-    parser.add_argument("--model_name", type=str, help='used to choose model config')
-    parser.add_argument("--init_model", type=str, default='')
-    parser.add_argument("--precision", type=str, choices=['fp32', 'bf16'], default='fp32')
-    parser.add_argument("--amp_level", type=str, default='O1')
-    parser.add_argument("--infer_times", type=int, default=1)
-    parser.add_argument("--diff_batch_size", type=int, default=-1)
-    parser.add_argument('--input_json', type=str,
-                        default=None, required=True,
-                        help='Paths to json file, each containing '
-                        'entity information including sequence, smiles or CCD, copies etc.')
-    parser.add_argument('--output_dir', type=str,
-                        default=None, required=True,
-                        help='Path to a directory that will store results.')
-    parser.add_argument('--ccd_preprocessed_path', type=str,
-                        default=None, required=True,
-                        help='Path to CCD preprocessed files.')
-    parser.add_argument('--jackhmmer_binary_path', type=str,
-                        default='/usr/bin/jackhmmer',
-                        help='Path to the JackHMMER executable.')
-    parser.add_argument('--hhblits_binary_path', type=str,
-                        default='/usr/bin/hhblits',
-                        help='Path to the HHblits executable.')
-    parser.add_argument('--hhsearch_binary_path', type=str,
-                        default='/usr/bin/hhsearch',
-                        help='Path to the HHsearch executable.')
-    parser.add_argument('--kalign_binary_path', type=str,
-                        default='/usr/bin/kalign',
-                        help='Path to the Kalign executable.')
-    parser.add_argument('--hmmsearch_binary_path', type=str,
-                        default='/usr/bin/hmmsearch',
-                        help='Path to the hmmsearch executable.')
-    parser.add_argument('--hmmbuild_binary_path', type=str,
-                        default='/usr/bin/hmmbuild',
-                        help='Path to the hmmbuild executable.')
-
-    # binary path of the tool for RNA MSA searching
-    parser.add_argument('--nhmmer_binary_path', type=str,
-                        default='/usr/bin/nhmmer',
-                        help='Path to the nhmmer executable.')
+@hydra.main(version_base=None, config_path=os.path.join(script_path,'config',),config_name='helixfold')
+def show_atom_id_ccd(cfg: DictConfig):
+    ccd_preprocessed_path = cfg.db.ccd_preprocessed
+    
+
+    ccd_id=cfg.ccd_id
+    if len(ccd_id) <= 3 and ccd_id in (ccd_dict:=load_ccd_dict(ccd_preprocessed_path)):
+        logging.info(f'Atoms in {ccd_id}: {ccd_dict[ccd_id]["atom_ids"]}')
+        return
+
+
+
     
-    parser.add_argument('--uniprot_database_path', type=str,
-                        default=None, required=True,
-                        help='Path to the Uniprot database for use '
-                        'by JackHMMER.')
-    parser.add_argument('--pdb_seqres_database_path', type=str,
-                        default=None, required=True,
-                        help='Path to the PDB '
-                        'seqres database for use by hmmsearch.')
-    parser.add_argument('--uniref90_database_path', type=str,
-                        default=None, required=True,
-                        help='Path to the Uniref90 database for use '
-                        'by JackHMMER.')
-    parser.add_argument('--mgnify_database_path', type=str,
-                        default=None, required=True,
-                        help='Path to the MGnify database for use by '
-                        'JackHMMER.')
-    parser.add_argument('--bfd_database_path', type=str, default=None,
-                        help='Path to the BFD database for use by HHblits.')
-    parser.add_argument('--small_bfd_database_path', type=str, default=None,
-                        help='Path to the small version of BFD used '
-                        'with the "reduced_dbs" preset.')
-    parser.add_argument('--uniclust30_database_path', type=str, default=None,
-                        help='Path to the Uniclust30 database for use '
-                        'by HHblits.')
-    # RNA MSA searching databases
-    parser.add_argument('--rfam_database_path', type=str,
-                        default=None, required=True,
-                        help='Path to the Rfam database for RNA MSA searching.')
-    parser.add_argument('--template_mmcif_dir', type=str,
-                        default=None, required=True,
-                        help='Path to a directory with template mmCIF '
-                        'structures, each named <pdb_id>.cif')
-    parser.add_argument('--max_template_date', type=str,
-                        default=None, required=True,
-                        help='Maximum template release date to consider. '
-                        'Important if folding historical test sets.')
-    parser.add_argument('--obsolete_pdbs_path', type=str,
-                        default=None, required=True,
-                        help='Path to file containing a mapping from '
-                        'obsolete PDB IDs to the PDB IDs of their '
-                        'replacements.')
-    parser.add_argument('--preset',
-                        default='full_dbs', required=False,
-                        choices=['reduced_dbs', 'full_dbs'],
-                        help='Choose preset model configuration - '
-                        'no ensembling and smaller genetic database '
-                        'config (reduced_dbs), no ensembling and full '
-                        'genetic database config  (full_dbs)')
-    parser.add_argument('--maxit_binary', type=str, default=None)
-    args = parser.parse_args()
-    main(args)
+    
+
+
+
+
+if __name__ == '__main__':
+    main()
diff --git a/apps/protein_folding/helixfold3/helixfold/model/config.py b/apps/protein_folding/helixfold3/helixfold/model/config.py
index 6da8566a..f9dbbf1d 100644
--- a/apps/protein_folding/helixfold3/helixfold/model/config.py
+++ b/apps/protein_folding/helixfold3/helixfold/model/config.py
@@ -15,7 +15,8 @@
 """Model config."""
 
 import copy
-import ml_collections
+from typing import Any, Union
+from omegaconf import DictConfig
 
 
 NUM_RES = 'num residues placeholder'
@@ -24,27 +25,47 @@
 NUM_TEMPLATES = 'num templates placeholder'
 
 
-def model_config(name: str) -> ml_collections.ConfigDict:
+def model_config(config_diffs: Union[str, DictConfig, dict[str, dict[str, Any]]]) -> DictConfig:
   """Get the ConfigDict of a model."""
 
   cfg = copy.deepcopy(CONFIG_ALLATOM)
-  if name in CONFIG_DIFFS:
-    cfg.update_from_flattened_dict(CONFIG_DIFFS[name])
+  if config_diffs is None or config_diffs=='':
+    # early return if nothing is changed
+    return cfg
 
-  return cfg
+  if isinstance(config_diffs, DictConfig):
+    if 'preset' in config_diffs and (preset_name:=config_diffs['preset']) in CONFIG_DIFFS:
+      updated_config=CONFIG_DIFFS[preset_name]
+      cfg.merge_with_dotlist(updated_config)
+      print(f'Updated config from `CONFIG_DIFFS.{preset_name}`: {updated_config}')
 
+    
+    # update from detailed configuration
+    if any(root_kw in config_diffs for root_kw in CONFIG_ALLATOM):
 
-CONFIG_DIFFS = {
-    'allatom_demo': {
-        'model.heads.confidence_head.weight': 0.01
-    },
-    'allatom_subbatch_64_recycle_1': {
-        'model.global_config.subbatch_size': 64,
-        'model.num_recycle': 1,
-    },
+      for root_kw in CONFIG_ALLATOM:
+        if root_kw not in config_diffs:
+          continue
+        cfg.merge_with(DictConfig({root_kw:config_diffs[root_kw]})) # merge to override
+        print(f'Updated config from `CONFIG_DIFFS`:{root_kw}: {config_diffs[root_kw]}')
+    
+    return cfg
+  
+  raise ValueError(f'Invalid config_diffs ({type(config_diffs)}): {config_diffs}')
+    
+
+# preset for runs
+CONFIG_DIFFS: dict[str, list[str]] = {
+    'allatom_demo': [
+      'model.heads.confidence_head.weight=0.01'
+      ],
+    'allatom_subbatch_64_recycle_1': [
+        'model.global_config.subbatch_size=64',
+        'model.num_recycle=1',
+    ]
 }
 
-CONFIG_ALLATOM = ml_collections.ConfigDict({
+CONFIG_ALLATOM = DictConfig({
   'data': {   
     'num_blocks': 5,    # for msa block deletion
     'randomize_num_blocks': True,
diff --git a/apps/protein_folding/helixfold3/utils/__init__.py b/apps/protein_folding/helixfold3/helixfold/utils/__init__.py
similarity index 100%
rename from apps/protein_folding/helixfold3/utils/__init__.py
rename to apps/protein_folding/helixfold3/helixfold/utils/__init__.py
diff --git a/apps/protein_folding/helixfold3/utils/model.py b/apps/protein_folding/helixfold3/helixfold/utils/model.py
similarity index 96%
rename from apps/protein_folding/helixfold3/utils/model.py
rename to apps/protein_folding/helixfold3/helixfold/utils/model.py
index 67b36128..1fcc53d4 100644
--- a/apps/protein_folding/helixfold3/utils/model.py
+++ b/apps/protein_folding/helixfold3/helixfold/utils/model.py
@@ -17,12 +17,11 @@
 import numpy as np
 import paddle
 import paddle.nn as nn
-import logging
 import io
 
 from helixfold.model import modules_all_atom
 from helixfold.model import utils
-logger = logging.getLogger(__name__)
+from absl import logging
 
 class RunModel(nn.Layer):
     """
@@ -69,7 +68,7 @@ def init_params(self, params_path: str):
                 utils.pd_params_merge_qkvw(pd_params)
 
         elif params_path.endswith('.pd') or params_path.endswith('.pdparams'):
-            logger.info('Load as Paddle model')
+            logging.info('Load as Paddle model')
             pd_params = paddle.load(params_path)
 
         else:
diff --git a/apps/protein_folding/helixfold3/utils/utils.py b/apps/protein_folding/helixfold3/helixfold/utils/utils.py
similarity index 100%
rename from apps/protein_folding/helixfold3/utils/utils.py
rename to apps/protein_folding/helixfold3/helixfold/utils/utils.py
diff --git a/apps/protein_folding/helixfold3/pyproject.toml b/apps/protein_folding/helixfold3/pyproject.toml
new file mode 100644
index 00000000..33956227
--- /dev/null
+++ b/apps/protein_folding/helixfold3/pyproject.toml
@@ -0,0 +1,50 @@
+[build-system]
+requires = ["poetry-core>=1.0.0,<2.0.0"]
+build-backend = "poetry.core.masonry.api"
+
+[tool.poetry]
+name = "helixfold"
+version = "3.0.0"
+description = "Code for helixfold v3"
+authors = ["Name <email@address>"]
+
+readme = "README.md"
+license = "MIT"
+repository = "https://github.com/PaddlePaddle/PaddleHelix/blob/dev/apps/protein_folding/helixfold3"
+classifiers = [
+    "Topic :: Scientific/Engineering :: Biochemistry",
+    "Topic :: Scientific/Engineering :: Protein Engineering"
+]
+
+
+packages = [
+    { include = "helixfold" },
+    { include = "helixfold/*.py" },
+]
+
+
+[tool.poetry.dependencies]
+python = "^3.8" 
+
+absl-py = "0.13.0"
+biopython = "1.79"
+chex = "0.0.7"
+dm-haiku = "0.0.4"
+dm-tree = "0.1.6"
+docker = "5.0.0"
+immutabledict = "2.0.0"
+jax = "0.2.14"
+ml-collections = "0.1.0"
+pandas = "1.3.4"
+scipy = "1.9.0"
+rdkit-pypi = "2022.9.5"
+posebusters = "*" 
+hydra-core= "^1.3.2"
+omegaconf = "^2.3.0"
+joblib = "1.4.2"
+
+
+
+[tool.poetry.scripts]
+helixfold = 'helixfold.inference:main'
+helixfold_show_ccd = 'helixfold.inference:show_atom_id_ccd'
diff --git a/apps/protein_folding/helixfold3/requirements.txt b/apps/protein_folding/helixfold3/requirements.txt
deleted file mode 100644
index 660e43c1..00000000
--- a/apps/protein_folding/helixfold3/requirements.txt
+++ /dev/null
@@ -1,13 +0,0 @@
-absl-py==0.13.0
-biopython==1.79
-chex==0.0.7
-dm-haiku==0.0.4
-dm-tree==0.1.6
-docker==5.0.0
-immutabledict==2.0.0
-jax==0.2.14
-ml-collections==0.1.0
-pandas==1.3.4
-scipy==1.9.0
-rdkit-pypi==2022.9.5 
-posebusters
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/run_infer.sh b/apps/protein_folding/helixfold3/run_infer.sh
deleted file mode 100644
index 5b0644e5..00000000
--- a/apps/protein_folding/helixfold3/run_infer.sh
+++ /dev/null
@@ -1,39 +0,0 @@
-#!/bin/bash
-
-PYTHON_BIN="/usr/bin/python3" # changes to your python
-ENV_BIN="/root/miniconda3/bin"  # change to your env
-MAXIT_SRC="PATH/TO/MAXIT/SRC" # changes to your MAXIT
-export OBABEL_BIN="PATH/TO/OBABEL/BIN" # changes to your openbabel
-DATA_DIR="./data"
-export PATH="$MAXIT_SRC/bin:$PATH"
-
-CUDA_VISIBLE_DEVICES=0 "$PYTHON_BIN" inference.py \
-    --maxit_binary "$MAXIT_SRC/bin/maxit" \
-    --jackhmmer_binary_path "$ENV_BIN/jackhmmer" \
-	--hhblits_binary_path "$ENV_BIN/hhblits" \
-	--hhsearch_binary_path "$ENV_BIN/hhsearch" \
-	--kalign_binary_path "$ENV_BIN/kalign" \
-	--hmmsearch_binary_path "$ENV_BIN/hmmsearch" \
-	--hmmbuild_binary_path "$ENV_BIN/hmmbuild" \
-    --nhmmer_binary_path "$ENV_BIN/nhmmer" \
-    --preset='reduced_dbs' \
-    --bfd_database_path "$DATA_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt" \
-    --small_bfd_database_path "$DATA_DIR/small_bfd/bfd-first_non_consensus_sequences.fasta" \
-    --bfd_database_path "$DATA_DIR/small_bfd/bfd-first_non_consensus_sequences.fasta" \
-    --uniclust30_database_path "$DATA_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08" \
-    --uniprot_database_path "$DATA_DIR/uniprot/uniprot.fasta" \
-    --pdb_seqres_database_path "$DATA_DIR/pdb_seqres/pdb_seqres.txt" \
-    --uniref90_database_path "$DATA_DIR/uniref90/uniref90.fasta" \
-    --mgnify_database_path "$DATA_DIR/mgnify/mgy_clusters_2018_12.fa" \
-    --template_mmcif_dir "$DATA_DIR/pdb_mmcif/mmcif_files" \
-    --obsolete_pdbs_path "$DATA_DIR/pdb_mmcif/obsolete.dat" \
-    --ccd_preprocessed_path "$DATA_DIR/ccd_preprocessed_etkdg.pkl.gz" \
-    --rfam_database_path "$DATA_DIR/Rfam-14.9_rep_seq.fasta" \
-    --max_template_date=2020-05-14 \
-    --input_json data/demo_6zcy.json \
-    --output_dir ./output \
-    --model_name allatom_demo \
-    --init_model init_models/HelixFold3-240814.pdparams \
-    --infer_times 1 \
-    --diff_batch_size 1 \
-    --precision "fp32"
\ No newline at end of file
diff --git a/apps/protein_folding/helixfold3/utils/misc.py b/apps/protein_folding/helixfold3/utils/misc.py
deleted file mode 100644
index b8faa065..00000000
--- a/apps/protein_folding/helixfold3/utils/misc.py
+++ /dev/null
@@ -1,39 +0,0 @@
-# copyright (c) 2024 PaddleHelix Authors. All Rights Reserve.
-#
-# Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0
-# International License (the "License");  you may not use this file  except
-# in compliance with the License. You may obtain a copy of the License at
-#
-#    http://creativecommons.org/licenses/by-nc-sa/4.0/
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""
-misc utils
-"""
-
-import logging
-
-def set_logging_level(level):
-    level_dict = {
-        "NOTSET": logging.NOTSET,
-        "DEBUG": logging.DEBUG,
-        "INFO": logging.INFO,
-        "WARNING": logging.WARNING,
-        "ERROR": logging.ERROR,
-        "CRITICAL": logging.CRITICAL
-    }
-    logging.basicConfig(
-        format='%(asctime)s %(levelname)s %(message)s',
-        level=level_dict[level],
-        datefmt='%Y-%m-%d %H:%M:%S')
-    for h in logging.root.handlers: 
-        h.setFormatter(
-            logging.Formatter(
-                '%(asctime)s %(levelname)s %(message)s', 
-                datefmt='%Y-%m-%d %H:%M:%S'
-            ))
-    logging.root.setLevel(level_dict[level])