Skip to content

Commit

Permalink
Merge pull request #84 from ARUP-NGS/dev
Browse files Browse the repository at this point in the history
Dev merge for 1.0.3
  • Loading branch information
Daniel Baker authored Jun 20, 2016
2 parents 6c5a889 + 3be1a45 commit 8ac88b6
Show file tree
Hide file tree
Showing 68 changed files with 4,642 additions and 5,918 deletions.
277 changes: 59 additions & 218 deletions MANUAL.md

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ CXXSTD=c++11
CSTD=gnu99
CC=g++
GIT_VERSION := $(shell git describe --abbrev=4 --dirty --always)
CFLAGS= -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"$(GIT_VERSION)\" -std=$(CSTD) -fno-builtin-gamma -pedantic
FLAGS= -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"$(GIT_VERSION)\" -std=$(CXXSTD) -fno-builtin-gamma -pedantic
CFLAGS= -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"$(GIT_VERSION)\" -std=$(CSTD) -fno-builtin-gamma -pedantic
FLAGS= -Wuninitialized -Wunreachable-code -Wall -fopenmp -DBMF_VERSION=\"$(GIT_VERSION)\" -std=$(CXXSTD) -fno-builtin-gamma -pedantic # -Weffc++
LD= -lm -lz -lpthread
INCLUDE= -Ihtslib -Iinclude -I.
LIB=
Expand All @@ -30,9 +30,9 @@ DLIB_SRC = dlib/cstr_util.c dlib/math_util.c dlib/vcf_util.c dlib/io_util.c dlib

SOURCES = include/sam_opts.c src/bmf_dmp.c include/igamc_cephes.c src/bmf_hashdmp.c \
src/bmf_sdmp.c src/bmf_rsq.c src/bmf_famstats.c include/bedidx.c \
src/bmf_err.c src/bmf_infer.c\
src/bmf_err.c \
lib/kingfisher.c src/bmf_mark.c src/bmf_cap.c lib/mseq.c lib/splitter.c \
src/bmf_main.c src/bmf_target.c src/bmf_depth.c src/bmf_vetter.c src/bmf_sort.c src/bmf_stack.c \
src/bmf_main.c src/bmf_target.c src/bmf_depth.c src/bmf_vet.c src/bmf_sort.c src/bmf_stack.c \
lib/stack.c src/bmf_filter.c $(DLIB_SRC)

TEST_SOURCES = test/target_test.c test/ucs/ucs_test.c test/tag/array_tag_test.c
Expand Down Expand Up @@ -95,7 +95,7 @@ tag_test: $(OBJS) $(TEST_OBJS) libhts.a
target_test: $(D_OBJS) $(TEST_OBJS) libhts.a
$(CC) $(FLAGS) $(DB_FLAGS) $(INCLUDE) $(LIB) $(LD) dlib/bed_util.dbo src/bmf_target.dbo test/target_test.dbo libhts.a -o ./target_test && ./target_test
hashdmp_test: $(BINS)
cd test/dmp && python hashdmp_test.py && cd ../..
cd test/collapse && python hashdmp_test.py && cd ../..
marksplit_test: $(BINS)
cd test/marksplit && python marksplit_test.py && cd ../..
err_test: $(BINS)
Expand Down
45 changes: 9 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,10 @@ Name | Use |
:---:|:----|
bmftools cap| Postprocess a tagged BAM for BMF-agnostic tools.|
bmftools depth| Calculates depth of coverage over a set of bed intervals.|
bmftools dmp| Demultiplex inline barcoded experiments.|
bmftools collapse| Collapse initial fastq records by barcode|
bmftools err| Calculate error rates based on cycle, base call, and quality score.|
bmftools famstats| Calculate family size statistics for a bam alignment file.|
bmftools filter| Filter or split a bam file by a set of filters.|
bmftools mark| Add unclipped start position as annotation for both read and mate.|
bmftools rsq| Rescue bmf-sorted or ucs-sorted bam alignments.|
bmftools sdmp| Demultiplex secondary-index barcoded experiments.|
bmftools sort| Sort for bam rescue.|
bmftools stack| A maximally-permissive variant caller using molecular barcode metadata analogous to samtools mpileup.|
bmftools target| Calculates on-target rate.|
bmftools vet| Curate variant calls from another variant caller (.bcf) and a bam alignment.|
Expand All @@ -53,27 +49,16 @@ These tools are divided into four categories:
1. Core functionality
2. Manipulation
3. Analysis
4. Utilities

### Core Functionality

####bmftools dmp/sdmp
bmftools dmp and sdmp perform **molecular** demultiplexing of barcoded experiments, for inline and secondary index chemistries, respectively.
####bmftools collapse
bmftools collapse combines reads sharing barcodes into single observations respectively.

First, these tools add the barcodes to the comment fields of the fastqs and split the records into subsets based on the first characters in the barcode.
Then, reads with exactly-matching barcode are collapsed into a unique observation, with a meta-analysis performed on each base call.
First, the barcodes are added to the comment fields of the fastqs and split the records into subsets based on the first characters in the barcode.
Then, reads with exactly-matching barcode are collapsed, with a meta-analysis performed on each base call.

Since there can be errors in reading the barcode,
`bmftools rsq` is made available for using positional information to rescue reads with mismatches into their proper families.

bmftools dmp collapses templates where both strands were sequenced, whereas sdmp lacks strand information.

Note: It is **STRONGLY** recommended that for the secondary-index chemistry that you mask adapter sequence in the molecular barcode reads.
When the secondary-index barcode read consists primarily or entirely of adapter, this informs us that the chemistry did not perform as expected.
This preprocessing will "N" those bases, marking the reads as QC fail with the FP integer tag (0 for fail, 1 for pass).

####bmftools rsq
Uses positional information to collapse reads with the same alignment signatures (start/stop/
bmftools collapse inline collapses templates where both strands were sequenced, whereas collapse secondary lacks strand information.

### Manipulation

Expand Down Expand Up @@ -109,35 +94,23 @@ Calculates on-target fraction for bed file using barcode metadata.

####bmftools err
Calculates error rates by a variety of parameters.
Additionally, pre-computes the quality score recalibration for the optional dmp/sdmp recalibration step.
Additionally, pre-computes the quality score recalibration for the optional collapse recalibration step.

####bmftools famstats
Calculates summary statistics related to family size and demultiplexing.

####bmftools stack
A maximally-permissive variant caller using molecular barcode metadata analogous to samtools mpileup.

### Utilities

####bmftools mark
Adds auxiliary tags to reads for their mates' information. Required for bmftools rsq.

####bmftools sort
Sorts reads based on positional information to preprocess for bmftools rsq.


## BMF Tags

Tag | Content | Format |
:----:|:-----|:-----:|
DR | Whether the read was sequenced from both strands. Only valid for Loeb-like inline barcodes. | Integer [0, 1] |
DR | Whether the read was sequenced from both strands. Only valid for inline chemistry. | Integer [0, 1] |
FA | Number of reads in Family which Agreed with final sequence at each base | uint32_t array |
FM | Size of family (number of reads sharing barcode.), e.g., "Family Members" | Integer |
FP | Read Passes Filter related to barcoding. Determines QC fail flag in bmftools mark (without -q).| Integer [0, 1]|
LM | Length of Mate | Integer |
MF | Mate fraction aligned (fraction of bases mapped to reference bases, not counting IDSHNP operations. | Float |
NC | Number of changed bases in rescued families of reads. | Integer |
NF | Mean number of differences between reads and consensus per read in family | Float |
NP | Number of Pre-rescue reads. Number of reads before rescue in a final post-rescue observation. | Integer |
PV | Phred Values for a base call after meta-analysis | uint32_t array |
RV | Number of reversed reads in consensus. Only for Loeb-style inline chemistry. | Integer |
RV | Number of reversed reads in consensus. Only for inline chemistry. | Integer |
14 changes: 0 additions & 14 deletions Snakemake/README

This file was deleted.

233 changes: 0 additions & 233 deletions Snakemake/inline_barcodes.sm

This file was deleted.

Loading

0 comments on commit 8ac88b6

Please sign in to comment.