In the genome assembly and annotation description, we introduce several new terms. This is a small definition table in case you are confused!
Term | Definition |
---|---|
Open Reading Frame (ORF) | A stretch of amino acids with no stop codon |
Coding Sequence (CDS) | An ORF that could encode a protein |
Protein encoding gene (PEG) | An ORF that could encode a protein |
Hypothetical protein | Something that has not been experimentally shown |
putative protein | Something that has not been experimentally shown |
Polypeptide | A short stretch of amino acids (typically about 20 amino acids or less) |
Contig | A contiguous piece of DNA sequence that has been assembled from more than one reads. It is compiled because, as noted above, the 5' end of one sequence overlaps the 3' end of another. |
Read | The unit of DNA sequence that comes from a sequencing instrument. A single piece of DNA sequence. |