SAMTOOLS Output Formats

Since SAM and BAM are originally not designed for local alignments, especially of protein sequences, this document describes Lambda's implementation of the standard.

Please see the official specification if some of the terms used here are not clear to you.

column	use in Lambda
QNAME	name of the query sequence, truncated at first whitespace
FLAG	bit 16 and bit 256 implemented in a standard conform way
RNAME	name of the subject sequence, truncated at first whitespace
POS	begin position of alignment on subject sequence; begin position on original untranslated DNA sequence for TBlastN, TBlastX, end position if negative strand; begin position on protein sequence for BlastP, BlastX
MAPQ	`255`
CIGAR	query DNA cigar (untranslated DNA sequence for BlastX, TBlastX); `*` for BlastP, TBlastN; reversed if negative strand/frame
RNEXT	`*`
PNEXT	`0`
TLEN	`0`
SEQ	query DNA sequence (untranslated DNA sequence for BlastX, TBlastX); `*` for BlastP, TBlastN; reverse-complemented if negative strand/frame; see below for clipping
QUAL	`*`
OPT	see below

Sequence strings

Following the recommendations of the specification the SEQ field is only written, if it is different from the previous line's SEQ field. This can be changed via Lambda's command line parameter --sam-bam-seq which can be set to always or never (the latter saves more space). This behaviour also applies to the qs tag defined below.

Clipping

Via the --sam-bam-clip parameter you can chose between hard-clipping and soft-clipping. Soft-clipping will result in full sequences in the SEQ and qs fields while hard-clipping will only show the locally matching part. Depending on that the CIGAR strings will also contain H or S characters. Hard-clipping is the default, because it takes up less space.

Please be aware that if the query sequence is translated, those DNA positions that are lost because frame-shifts or incomplete frames (at the end of a sequence) are always hard-clipped. These positions are also not represented in the protein cigar (see the qs tag below).

Optional tags

tag	description
	official
`AS`	bit score
`OC`	query protein cigar (`*` for BLASTN)
`NM`	edit distance (in protein space unless BLASTN)
`IH`	number of matches this query has
	regarding the alignment
`ae`	expect value
`ar`	raw score
`ai`	% identity (in protein space unless BLASTN)
`ap`	% positive (in protein space unless BLASTN)
	regarding the query sequence
`qf`	query frame
`qs`	query protein sequence (`*` for BLASTN)
	regarding the subject sequence
`sf`	subject frame
`st`	subject taxonomy ID(s) separated by `;` (see Taxonomic Workflows)
	regarding all matches of this query
`ls`	lowest common ancestor scientific name (see Taxonomic Workflows)
`lt`	lowest common ancestor taxonomy id (see Taxonomic Workflows)

These tags can be specified with the command line argument --sam-bam-tags. If you would like to see any other tags supported, please don't hesitate to contact us.

Header

BAM files require all subject names to be written to the header. For SAM this is not required, so Lambda does not automatically do it to save space (especially for protein database this is a lot!). If you still want them with SAM, e.g. for better BAM compatibility, use the --sam-with-refheader option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAMTOOLS Output Formats

Sequence strings

Clipping

Optional tags

Header

About

Install Guide

Usage Guide

Clone this wiki locally