Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Adibvafa authored Oct 29, 2024
1 parent 719870f commit 904ce6d
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,24 @@ A sample dataset is provided under [\demo](https://github.com/Adibvafa/CodonTran

<br>

### Arguments of predict_dna_sequence
| Argument | Type | Description | Default |
|----------|------|-------------|---------|
| `protein` | `str` | Input protein sequence | Required |
| `organism` | `Union[int, str]` | Organism ID (integer) or name (string) (e.g., "Escherichia coli general") | Required |
| `device` | `torch.device` | PyTorch device object specifying whether to run on CPU or GPU | Required |
| `tokenizer` | `Union[str, PreTrainedTokenizerFast, None]` | Either a file path to load tokenizer from, a pre-loaded tokenizer object, or None to load from HuggingFace's "adibvafa/CodonTransformer" | `None` |
| `model` | `Union[str, torch.nn.Module, None]` | Either a file path to load model from, a pre-loaded model object, or None to load from HuggingFace's "adibvafa/CodonTransformer" | `None` |
| `attention_type` | `str` | Type of attention mechanism to use in model - 'block_sparse' for memory efficient or 'original_full' for standard attention | `"original_full"` |
| `deterministic` | `bool` | If True, uses deterministic decoding (picks most likely tokens). If False, samples tokens based on probabilities adjusted by temperature | `True` |
| `temperature` | `float` | Controls randomness in non-deterministic mode. Lower values (0.2) are conservative and pick high probability tokens, while higher values (0.8) allow more diversity. Must be positive | `0.2` |
| `top_p` | `float` | Nucleus sampling threshold - only tokens with cumulative probability up to this value are considered. Balances diversity and quality of predictions. Must be between 0 and 1 | `0.95` |
| `num_sequences` | `int` | Number of different DNA sequences to generate. Only works when deterministic=False. Each sequence will be sampled based on the temperature and top_p parameters. Must be positive | `1` |
| `match_protein` | `bool` | Constrains predictions to only use codons that translate back to the exact input protein sequence. Only recommended when using high temperatures or error prone input proteins (e.g. not starting with methionine or having numerous repitions) | `False` |

Returns: `Union[DNASequencePrediction, List[DNASequencePrediction]]` containing predicted DNA sequence(s) and metadata
<br>

## Installation
Install CodonTransformer via pip:

Expand Down

0 comments on commit 904ce6d

Please sign in to comment.