-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
155 lines (118 loc) · 5.17 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
==========================================================
Scripts for munging output of NGS pipeline
==========================================================
This project provides a program with a command-line interface for
parsing Next Generation Sequencing data.
.. contents:: Table of Contents
dependencies
============
* Python 2.7.x
* A UNIX-like operating system (Linux, OS X). Not tested on Windows.
* BEDtools (for summarize_assay command only)
* PySam (for count_umi only)
* Numpy
* Natsort
* Pandas
* XlsxWriter
* suds
* xlwt
installation
============
Clone the project from the git repository::
cd ~/src
git clone [email protected]:sheenams/munge.git
cd munge
Now installation can be performed using the install script provided.
This will default to install at /home/genetics unless an install path is provided.
sudo ./install_munge /install/path
This script does a clean install.
execution
=========
The ``munge`` script provides the different scripts used to process
data output from the pipeline. Note that for development, it is convenient
to run ``munge`` from within the project directory by specifying the
relative path to the script::
% ./munge
Commands are constructed as follows. Every command starts with the
name of the script, followed by an "action" followed by a series of
required or optional "arguments". The name of the script, the action,
and options and their arguments are entered on the command line
separated by spaces. Help text is available for both the ``munge``
script and individual actions using the ``-h`` or ``--help`` options::
% munge -h
usage: munge [-h] [-V] [-v] [-q]
{help,xlsmaker,rename_hiseq,sample_crawler,}...
Utilities for the munge scripts
positional arguments:
{help,xlsmaker,rename_hiseq,control_parser,variant_crawler,
freq_creator,rename_miseq,db_annotation,quality_metrics,
getpfx,combined_cnv,combined_output,annovar_bed_parser,
qc_variants,combined_pindel,summary}
help Detailed help for actions using `help <action>`
xlsmaker Create xls workbook from all output files
rename_hiseq Rename and compress HiSeq files.
control_parser Compare quality control variants to OPX-240 output to
check quality of run
variant_crawler Create annovar file from Clinical variants csv
freq_creator Calculate tallies of variants and write anovar output
rename_miseq Rename MiSeq files for pipeline processing
db_annotation Create annotation of all variants in db (or only from
GATK)
quality_metrics Parse picard and CNV output to create quality metrics
file
getpfx Get prefixes files (PFX.[12].fastq.gz) for running
pipeline.
combined_cnv Crawl analysis files to create one analysis file with
all info
combined_output Crawl analysis files to create one analysis file with
all info
annovar_bed_parser Filter a file of genomic positions given ranges of
start positions
qc_variants Parse variant files from pipeline, 1000G, and Complete
Genomics to create QC Variant file
combined_pindel Crawl analysis files to create one analysis file with
all info
summary Summarize output from Annovar and EVS
optional arguments:
-h, --help show this help message and exit
-V, --version Print the version number and exit
-v, --verbose Increase verbosity of screen output (eg, -v is
verbose, -vv more so)
-q, --quiet Suppress output
Help text for an individual action is available by including the name
of the action::
% munge getpfx -h
usage: munge getpfx [-h] [-s SEPARATOR] datadir
Get prefixes files (PFX.[12].fastq.gz) for running pipeline.
Usage:
munge getpfx /path/to/data
positional arguments:
datadir Path to directory containing fastq files.
optional arguments:
-h, --help show this help message and exit
-s SEPARATOR, --separator SEPARATOR
separator for list of prefixes
versions
========
We use abbrevited git sha hashes to identify the software version::
% ./munge -V
0309.004ecac
unit tests
==========
Unit tests are implemented using the ``unittest`` module in the Python
standard library. The ``tests`` subdirectory is itself a Python
package that imports the local version (ie, the version in the project
directory, not the version installed to the system) of the ``munge``
package. All unit tests can be run like this::
munge % ./testall
........................
----------------------------------------------------------------------
Ran 24 tests in 0.155s
OK
A single unit test can be run by referring to a specific module,
class, or method within the ``tests`` package using dot notation::
munge % ./testone tests.test_subcommands.TestQCVariants
.
----------------------------------------------------------------------
Ran 1 test in 0.004s
OK