Update README.md

Described procedure for handling different MHC nomenclatures.
KarchinLab · Feb 2, 2024 · 1f9ab07 · 1f9ab07
1 parent 25373d6
commit 1f9ab07
Showing 1 changed file with 6 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -76,6 +76,12 @@ python predict.py -i=../data/example2.csv -m=el -a=HLA-A*02:02 -p=0 -c=0 -d="cpu
 
 Predictions will be written to `example1.csv.prd` and `example2.csv.prd` in the data folder. Execution takes a few seconds. Compare your output with `example1.csv.cmp` and `example2.csv.cmp` respectively.
 
+#### Supported Alleles
+
+BigMHC only supports MHC-I. In order to handle different MHC naming schemes, BigMHC will perform fuzzy string matching to find the nearest MHC by name. For example, `HLA-A*02:01`, `A*02:01`, `HLAA0201`, and `A0201` are all considered valid and equivalent allele names. Additionally, synonymous substitutions and noncoding fields are handled, so `HLA-A*02:01:01` should be mapped to `HLA-A*02:01`.
+
+We do not validate allele names. BigMHC will make predictions even if given nonsense or MHC-II input, as it will find the nearest valid MHC name to the provided invalid allele name. The list of alleles used in our multiple sequence alignment, to which input is mapped, can be found in the [pseudosequences data file](data/pseudoseqs.csv).
+
 #### Required Arguments
 * `-i` or `--input` input CSV file
   * Columns are zero-indexed