-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreadme_tmp.txt
119 lines (82 loc) · 4.18 KB
/
readme_tmp.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
## Peptide identification and formatting
### These steps use scripts from two other github repositories
- github.com/marcottelab/MS_grouped_lookup
- github.com/marcottelab/protein_complex_maps
## Key steps
### Get fractionation peptides into one file - consolidate identified peptides from multiple experiments into a single file by combining all pepcount files together:
### Create combined table of all experimentally identified peptides
#### In this case, input files are the output from github.com/marcottelab/run_msblender and github.com/marcottelab/msblender
python2 /path/to/protein_complex_maps/protein_complex_maps/preprocessing_util/msblender2elution.py --prot_count_files example_files/*pep*count*1 --output_filename consolidated_pepcounts/example.pepcount --fraction_name_from_filename --msblender_format --spectral_count_type TotalCount --pepcount
**Input tsvs of outputs of the marcottelab run_msblender/msblender pipeline:**
filename1: example_experiment_fraction10.pep_count_mFDRpsm001
#Peptide FDR: 0.1
#PepSeq TotalCount example_experiment_fraction10
MEATAK 5 5
ELVISR 3 3
filename2: example_experiment_fraction11.pep_count_mFDRpsm001
#Peptide FDR: 0.1
#PepSeq TotalCount example_experiment_fraction11
EAPEPTIDE 6 6
ELVISR 1 1
**Output tsv combined from all fractions:**
filename: consolidated_pepcounts/example.pepcount
example_experiment_fraction10 example_experiment_fraction11
MEATAK 5 0
EAPEPTIDE 0 6
ELVISR 3 1
### Convert .pepcount table into into tidy formatted csv:
python2 /path/to/protein_complex_maps/protein_complex_maps/preprocessing_util/elutionTidy.py --input_elution consolidated_pepcounts/example.pepcount --outfile consolidated_pepcounts/example.pepcount.tidy --firstcol Peptide --valuecol PeptideCount --experiment_id example_experiment
**Input is output of previous step:**
**Output csv:**
filename:consolidate_pepcounts/example.pepcount.tidy
ExperimentID,FractionID,Peptide,PeptideCount
OP_Corn_20181009,Corn_SEC_08a_102018,AAAAAGGGLFPMPDPK,1.0
example_experiment,example_experiment_fraction10,MEATAK,5
example_experiment,example_experiment_fraction11,EAPEPTIDE,6
example_experiment,example_experiment_fraction10,ELVISR,3
example_experiment,example_experiment_fraction11,ELVISR,1
### From a proteome fasta file, generate possible trypsinized peptide, allowing up to two missed cleavages
python /path/to/MS_grouped_lookup/scripts/proteome_utils/trypsin.py -i example_files/example.fasta -o protein_identification/example_peptides.csv -m 2 -p TRUE
**Input fasta file:**
>prot1
MEATAKEAPEPTIDE
>prot2
ELVISRLIVES
>prot3
MAKELVISR
**Output csv of possible peptides:**
ProteinID,Peptide,Start,End
prot1,MEATAK,1,6
prot1,MEATAKEAPEPTIDE,1,15
prot1,EAPEPTIDE,7,15
prot2,ELVISR,1,6
prot2,ELVISRLIVES,1,11
prot2,LIVES,7,11
prot3,ELVISR,4,9
prot3,MAKELVISR,1,9
### Reduce these possible peptides to only ones that are unique to an individual protein
python /path/to/MS_grouped_lookup/scripts/lookup_utils/define_grouping.py --peptides example_files/example_peptides.csv --output_basename example_unique_peptides.csv
**Input is output of previous step**
**Output csv of only unique peptides:**
filename:example_unique_peptides.csv
ProteinID,Peptide
prot1,MEATAK,1,6
prot1,MEATAKEAPEPTIDE,1,15
prot1,EAPEPTIDE,7,15
prot2,ELVISRLIVES,1,11
prot2,LIVES,7,11
prot3,MAKELVISR,1,9
### With files of experimentally observed peptides, and protein-unique peptides, can now identify proteins from each experiment
Rscript scripts/peptide_identification.R --elut_wide example_file/example.pepcount --peps example_files/example_unique_peptides.csv
**Input: Output of previous steps**
**Outputs:**
1. pepcount.annot.long.tidy
Peptide,ExperimentID,experiment_order,experiment_name,ProteinID,Start,End,FractionID,pepcount,FractionOrder,totfracts
MFFESR,soybn_OP_Soy_sprout_SEC_20172110,19,SOYBN_SEC1,tr|I1KSR1|I1KSR1_SOYBN,1,6,Soy_sprout_SEC_26_1a_10212017,1,24,67
2. pepcount.annot.short.tidy
Same as above file, but with peptides with zero observations removed
3. fraction_order.csv: Preserves order of each fraction in the experiment
FractionOrder,FractionID
1,fractionid_Soy_sprout_SEC_03_1a_10212017
2,fractionid_Soy_sprout_SEC_04_1a_10212017
3,fractionid_Soy_sprout_SEC_05_1a_10222017