-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathhelp.txt
131 lines (129 loc) · 10.2 KB
/
help.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
usage: isafe [-h] [-v] [--format FORMAT] --input INPUT --output OUTPUT [--vcf-cont VCF_CONT] [--sample-case SAMPLE_CASE]
[--sample-cont SAMPLE_CONT] [--AA AA] [--region REGION] [--MaxRegionSize MAXREGIONSIZE]
[--MinRegionSize-bp MINREGIONSIZE_BP] [--MinRegionSize-ps MINREGIONSIZE_PS] [--MaxGapSize MAXGAPSIZE]
[--window WINDOW] [--step STEP] [--topk TOPK] [--MaxRank MAXRANK] [--MaxFreq MAXFREQ]
[--RandomSampleRate RANDOMSAMPLERATE] [--ForceRandomSample] [--IgnoreGaps] [--StatusOff] [--WarningOff]
[--OutputPsi] [--SAFE]
====================================================================
iSAFE: (i)ntegrated (S)election of (A)llele (F)avored by (E)volution
====================================================================
Source code & further instructions can be found at: https://github.com/alek0991/iSAFE
iSAFE v1.1.1
--------------------------------------------------------------------
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--format FORMAT, -f FORMAT
<string>: Input format. <FORMAT> must be either hap or vcf (see the manual for more details).
iSAFE can handle two types of inputs (phased haplotypes are required):
* vcf format: --format vcf or -f vcf
- vcf format only accepts indexed bgzipped VCF (vcf.gz with index file) or indexed bcf files (.bcf with index file).
- When input format is vcf, Ancestral Allele file (--AA) must be given.
* hap format: --format hap or -f hap
- Input with hap format is not allowed with any of these: --vcf-cont, --sample-case, --sample-cont, --AA.
- With hap format, iSAFE assumes that derived allele is 1 and ancestral allele is 0 in the input file,
and the selection is ongoing (the favored mutation is not fixed).
Default: vcf
--input INPUT, -i INPUT
<string>: Path to the input (case population).
* Input positions must be sorted numerically, in increasing order.
--output OUTPUT, -o OUTPUT
<string>: Path to the output(s).
* iSAFE generates <OUTPUT>.iSAFE.out
* When --OutputPsi is set, iSAFE generates <OUTPUT>.Psi.out in addition to <OUTPUT>.iSAFE.out
--vcf-cont VCF_CONT <string>: Path to the phased control population.
* only accepts indexed bgzipped VCF (vcf.gz with index file) or indexed bcf files (.bcf with index file).
* This is optional but recommended for capturing fixed sweeps.
* This option is only available with --format vcf.
* You can choose a subset of samples in this file by using --sample-cont option,
otherwise all the samples in this file are cosidered as control population.
* You must use --sample-case and --sample-cont when --input and --vcf-cont are the same (all samples are provided in a single vcf file).
* You can (you don't have to) use 1000 Genome Project populations as control.
- Download link of phased VCF files of 1000GP (GRCh37/hg19): http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/
- Download link of phased VCF files of 1000GP (GRCh38/hg38): http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/
--sample-case SAMPLE_CASE
<string>: Path to the file containing sample ID's of the case population.
* This option is only available in --format vcf.
* When this option is not used all the samples in the --input are considered as the case samples.
* This file must have two columns, the first one is population and the second column
is sample ID's (must be a subset of ID's used in the --input vcf file).
* This file must be TAB separated, no header, and comments by #.
* You must use --sample-case and --sample-cont when --input and --vcf-cont are the same (all samples are provided in a single vcf file).
* Population column (first column) can have more than one population name. They are all considered the case populations.
* Sample ID's must be subset of the --input vcf file
--sample-cont SAMPLE_CONT
<string>: Path to the file containing sample ID's of the control population(s).
* This option is only available in --format vcf.
* When this option is not used all the samples in the --vcf-cont are considered as the control samples.
* This file must have two columns, the first one is population and the second column
is sample ID's (must be a subset of ID's used in the --vcf-cont file).
* This file must be TAB separated, no header, and comments by #.
* You must use --sample-case and --sample-cont when --input and --vcf-cont are the same (all samples are provided in a single vcf file).
* Population column (first column) can have more than one population name. They are all considered the control populations.
* Sample ID's must be subset of the --vcf-cont file
--AA AA <string>: Path to the Ancestral Allele (AA) file in FASTA (.fa) format.
* This is strongly recommended in --format vcf. However, if the ancestral allele file is not available the program raises a warning and assumes reference allele (REF) is ancestral allele.
* Download link (GRCh37/hg19): http://ftp.ensembl.org/pub/release-75/fasta/ancestral_alleles/
* Download link (GRCh38/hg38): http://ftp.ensemblorg.ebi.ac.uk/pub/release-88/fasta/ancestral_alleles/
--region REGION <chr:string>:<start position:int>-<end position:int>, the coordinates of the target region in the genome.
* This is required in --format vcf but optional in the --format hap.
* In vcf format, <chr> style (e.g. chr2 or 2) must be consistent with vcf files.
* The <chr> is dumped in --format hap.
* Valid Examples:
2:10000000-15000000
chr2:10000000-15000000
2:10,000,000-15,000,000
chr2:10,000,000-15,000,000
--MaxRegionSize MAXREGIONSIZE
<int>: Maximum region size in bp.
* Consider the memory (RAM) size when change this parameter.
Default: 6000000
--MinRegionSize-bp MINREGIONSIZE_BP
<int>: Minimum region size in bp.
Default: 200000
--MinRegionSize-ps MINREGIONSIZE_PS
<int>: Minimum region size in polymorphic sites.
* Note that --window cannot be smaller than --MinRegionSize-ps.
Default: 1000
--MaxGapSize MAXGAPSIZE
<int>: Maximum gap size in bp.
* When there is a gap larger than --MaxGapSize the program raise an error.
* You can ignore this by setting the --IgnoreGaps flag.
Default: 10000
--window WINDOW <int>: Sliding window size in polymorphic sites.
Default: 300
--step STEP <int>: Step size of sliding window in polymorphic sites.
Default: 150
--topk TOPK <int>: Rank of SNPs used for learning window weights (alpha).
Default: 1
--MaxRank MAXRANK <int>: Ignore SNPs with rank higher than MAXRANK.
* For considering all SNPs set --MaxRank > --window.
* The higher the --MaxRank, the higher the computation time.
Default: 15
--MaxFreq MAXFREQ <float>: Ignore SNPs with frequency higher than MaxFreq.
Default: 0.95
--RandomSampleRate RANDOMSAMPLERATE
<float>: Portion of added random samples.
* RandomSampleRate = RandomSamples/(RandomSamples+CaseSamples).
* Must be non-negative and less than 1.
Default: 0.1
* Ignored when --vcf-case is not used.
* Ignored when MDDAF criterion doesn't recommend adding random samples. The option --ForceRandomSample
can be used to override MDDAF criterion.
--ForceRandomSample, -FRS
<bool>: Set this flag to force the iSAFE to use random samples even when MDDAF doesn't recommend.
* --vcf-cont must be provided.
Default: false
--IgnoreGaps, -IG <bool>: Set this flag to ignore gaps.
Default: false
--StatusOff, -SO <bool>: Set this flag to turn off printing status.
Default: false
--WarningOff, -WO <bool>: Set this flag to turn off warnings.
Default: false
--OutputPsi, -Psi <bool>: Set this flag to output Psi_1 in a text file with suffix .Psi.out.
Default: false
--SAFE <bool>: Set this flag to report the SAFE score of the entire region.
* When the region size is less than --MinRegionSize-ps (Default: 1000 SNPs) or --MinRegionSize-bp (Default: 200kbp),
the region is too small for iSAFE analysis. Therefore, It's better to use --SAFE flag to report the SAFE scores of
the entire region instead of iSAFE scores.
Default: false