Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 1.72 KB

NeighbourDist.md

File metadata and controls

32 lines (23 loc) · 1.72 KB

NeighbourDist

NeighbourDist plots the distribution of nearest-neighbour distances across a set of CDR3 sequences. Where these are taken from a clonally expanded sequence set, the distribution will typically show two peaks: the minimum between these peaks indicates a suitable identity threshold for clustering. The field IMGT-CDR3 is used, and is required to be present in the file. It may contain either amino acid or nucleotide sequences. The command also plots the distribution of CDR3 lengths. For large data sets, the analysis can be restricted to a random sample.

NeighbourDist requires NumPy, matplotlib and python-Levenshtein.

Usage

 NeighbourDist.py [-h] [-l LIMIT] [-v] [-i] [-g LENGTH_LIMS]
                  [-d DIST_LIMS] [-c]
                  infile outprefix
Argument Meaning
infile Input file names (IMGT/IgBLASTPlus format).
outprefix Prefix to use on output file names
[-l LIMIT] Randomly select at most LIMIT sequences from the file
[-v] Display verbose status output
[-i] Display the plots interactively as well as writing them to files
[-g LENGTH_LIMS] 4 comma-separated numbers specifying ranges for the length distribution plot (xmin,xmax,ymin,ymax)
[-d DIST_LIMS] 4 comma-separated numbers specifying ranges for the nearest neighbour plot (xmin,xmax,ymin,ymax)
[-h] Provide a help message and exit
[-c] Provide comma separated files rather than plots

Output Files

(prefix)_CDR3_length.pdf(csv) - length distribution (prefix)_CDR3_min_dist.pdf(csv) - nearest neighbour distribution

Testing

testfiles/Test_Commands.bat exercises the script with sample data.