Skip to content

Commit

Permalink
Update examples/msas/README.md
Browse files Browse the repository at this point in the history
Co-authored-by: Jack Dent <[email protected]>
  • Loading branch information
2 people authored and arogozhnikov committed Oct 16, 2024
1 parent 0ef0d83 commit 7d708b1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion examples/msas/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The easiest way to provide MSA information to Chai-1 is through the `.aligned.pq

- `sequence` - contains alignment hits in the `a3m` sequence format.
- `source_database` - contains information regarding the database that the sequence came from. This can be set to one of `uniprot`, `uniref90`, `bfd_uniclust`, `mgnify`, or `query`. This source is featurized as an input to Chai-1 (and is therefore not just for book keeping!). The `query` key should only occur once in the table as the first row and indicates the query sequence used to construct the hits. If your alignments come from a database not included in these four options, it's probably a good idea to experiment with setting a source to get best results; `uniref90` should be a good "catchall" choice in general.
- `pairing_key` - a string that indicates a "key" for how alignments for different sequences in a complex should be "paired" up with each other (similarly to AlphaFold multimer and AlphaFold3) to capture co-evoluationary information across chains. Pairing is done across all sequences with the same pairing key. Therefore, while this pairing key is typically set to a species identifier, you might want to consider setting it to some other key that provides a more or less general grouping by which to match up MSA alignments across different sequences.
- `pairing_key` - a string that indicates a "key" for how alignments for different sequences in a complex should be "paired" up with each other (similarly to AlphaFold-Multimer and AlphaFold 3) to capture evolutionary information across chains. Pairing is done across all sequences with the same pairing key. Therefore, while this pairing key is typically set to a species identifier, you might want to consider setting it to some other key that provides a more or less general grouping by which to match up MSA alignments across different sequences.
- `comment` - the wild west. You may put whatever string you choose here and it will be ignored; this field is provided primarily for human readability and book keeping.

This file format offers several advantages over standard `.a3m` files:
Expand Down

0 comments on commit 7d708b1

Please sign in to comment.