Skip to content

Commit

Permalink
update readme (#121)
Browse files Browse the repository at this point in the history
  • Loading branch information
wukevin authored Oct 21, 2024
1 parent 606d7ff commit 4fbd82a
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 4 deletions.
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The following script demonstrates how to provide inputs to the model, and obtain
python examples/predict_structure.py
```

For more advanced use cases, we also expose the `chai_lab.chai1.run_folding_on_context`, which allows users to construct an `AllAtomFeatureContext` manually. This allows users to specify their own templates, MSAs, embeddings, and constraints. We currently provide an example of how to construct an embeddings context, and will be releasing helper methods to build MSA and templates contexts soon.
For more advanced use cases, we also expose the `chai_lab.chai1.run_folding_on_context`, which allows users to construct an `AllAtomFeatureContext` manually. This allows users to specify their own templates, MSAs, embeddings, and constraints. We currently provide an example of how to construct an embeddings context as well as an MSA context, and will be releasing helper methods to build template contexts soon.

<details>
<summary>Where are downloaded weights stored?</summary>
Expand All @@ -43,6 +43,14 @@ CHAI_DOWNLOADS_DIR=/tmp/downloads python ./examples/predict_structure.py
</p>
</details>

<details>
<summary>How can MSAs be provided to Chai-1?</summary>
<p markdown="1">

Chai-1 supports MSAs provided as an `aligned.pqt` file. This file format is similar to an `a3m` file, but has additional columns that provide metadata like the source database and sequence pairing keys. We provide code to convert `a3m` files to `aligned.pqt` files. For more information on how to provide MSAs to Chai-1, see [this documentation](examples/msas/README.md).

</p>
</details>

## ⚡ Try it online

Expand Down
18 changes: 15 additions & 3 deletions examples/msas/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Adding MSA evolutionary information

While Chai-1 performs very well in "single-sequence mode," it can also be given additional evolutionary information to further improve performance. As in other folding methods, this evolutionary information is provided in the form of a multiple sequence alignment (MSA).
While Chai-1 performs very well in "single-sequence mode," it can also be given additional evolutionary information to further improve performance. As in other folding methods, this evolutionary information is provided in the form of a multiple sequence alignment (MSA). This information is given in the form of a `MSAContext` object (see `chai_lab/data/dataset/msas/msa_context.py`); we provide code for building these `MSAContext` objects through `aligned.pqt` files, though you can play with building out an `MSAContext` yourself as well.

## The `.aligned.pqt` file format

Expand All @@ -24,24 +24,36 @@ See the following for a toy example of what this table might look like:
| RKSES... | uniprot | Mus musculus | A mouse sequence from uniprot |
| ... |

We additionally provide code to parse `a3m` files into this format; see `merge_multi_a3m_to_aligned_dataframe` in `chai_lab/data/parsing/msas/aligned_pqt.py`.
We additionally provide code to parse `a3m` files into this format; see `merge_multi_a3m_to_aligned_dataframe` in `chai_lab/data/parsing/msas/aligned_pqt.py`. This file can also be run as a commandline script to run ; run `python chai_lab/data/parsing/msas/aligned_pqt.py --help` for details.

### TLDR

Chai-1 uses `.aligned.pqt` files to specify MSAs. These are similar to `a3m` with added columns for source database and pairing key to pair MSAs across different chains. Each `.aligned.pqt` file contains all MSAs for a single query sequence.

## From `.aligned.pqt` to `MSAContext`

By default, the `run_inference` example inference code we provide assumes that all MSAs required for a prediction are stored in a specified folder. Each file corresponds to the all alignments for a given sequence, and filenames are specified by the hash of their sequence (this filename is inferred using code in `chai_lab/data/parsing/msas/aligned_pqt.py`). During inference, the script tries to find `<HASH>.aligned.pqt` files in that folder (one file per unique chain sequence) and loads in a `MSAContext` for each MSA it can find; see `chai_lab/data/dataset/msas/load.py` for details.
By default, the `run_inference` example inference code we provide assumes that all MSAs required for a prediction are stored in a specified folder. Each `.aligned.pqt` file in that folder corresponds to the all MSA alignments for a given sequence (spanning several databases), and filenames are specified by the hash of their sequence (this filename is inferred using code in `chai_lab/data/parsing/msas/aligned_pqt.py`). During inference, the script tries to find `<HASH>.aligned.pqt` files in that folder (one file per unique chain sequence) and loads in a `MSAContext` for each MSA it can find. The code then performs some basic preprocessing such as pairing MSAs by their given `pairing_key` and merging MSAs across chains; see `chai_lab/data/dataset/msas/load.py` for details.

## Putting it all together

To demonstrate how these pieces tie together, we provide `aligned.pqt` files containing MSAs for the example in `examples/predict_structure.py` under the `examples/msas` folder. Inference can be run using these example MSAs by providing the path to this folder as an additional argument to `run_inference` as follows:

```python
from pathlib import Path
...

candidates = run_inference(
...
msa_directory=Path("examples/msas"),
...
)
```

You can also manually inspect the example `aligned.pqt` files by loading them as pandas dataframes as follows:

```python
import pandas as pd

aligned_pqt = pd.read_parquet("examples/msas/703adc2c74b8d7e613549b6efcf37126da7963522dc33852ad3c691eef1da06f.aligned.pqt")
aligned_pqt.head()
```

0 comments on commit 4fbd82a

Please sign in to comment.