Skip to content

ifishlin/sprintdeeplearning

Repository files navigation

sprintdeeplearning

collected codes from

https://github.com/IanLewis/tensorflow-examples

https://github.com/leriomaggio/deep-learning-keras-tensorflow

https://github.com/rouseguy/scipyUS2016_dl-image

http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLSD15_2.html

https://github.com/mikekestemont/ghentDL

http://brohrer.github.io/how_convolutional_neural_networks_work.html

https://ronxin.github.io/wevi/

HW1, Protein-DNA binding prediction (CTCF)

train.data: Train sample size 77531

name sequence label
train.data name 101 length 0 negative, 1 positive

test.data: Test sample szie 19383 (positive 9709 , negative 9674)

test_ans.data: (update), test data with 答案.

encodingSeq.py - sequence encoding

# change the first line #!/home/fish/anaconda3/bin/python to your python directory
# encodingSeq.py train.data flanking_length
# for example
encodingSeq.py train.data 10

繳交作業

  • Programs (.txt, .py)
  • Training data and validation data accuracy. (Trend Chart)
  • Prediction for 19383 test.data. (19383 row, 0 negative, 1 positive)
  • Description(1~2 pages) for what paramters do you try in this homework.

繳交日期

  • 2/22 中午前.

繳交方式

分組

  • 2/15上課分組.
 組別     成員 accuracy
1 winiel559 0.8493
2 chou.yuta 0.4959
3 wtwang, jason 0.5167
4 rouanshen, ilunteng 0.6773
5 bomson, andrewkuo 0.8845
6 yichun1492 0.5299
7 alicetuan 0.7831
8 jill 0.6538
9 fish 0.8260

/notebooks/fish/udacity/4_convolutions-HW.ipynb <== base line code

 Fish's 參數   value
convolution 2 * (12 filter, size 11 * 4)
NN 64 hidden
GE SGE, batch_size 1000, 30000 run
Learning rate 0.05
relu unused
pooling unused
k-fold unused
strides 2 ([1, 2, 2, 1])
padding VALID (no padding)

可以調的參數

  • CNN(width, depth, filter size, filter number, max_pool/avg_pool ,full-connected network (hidden number), learning rate, stochastic GD, dropout ... etc

參考範本

HW2, Protein-DNA binding prediction using protein sequence (CTCF)

Descriptions

  1. Build a CNN which use featrues from DNA sequence data (HW1) and protein sequence (HW2) seperately.
  2. Before classifier, concatenate two feature vectors and then feed it into classifier (NN)
  3. filter size of amino acide should be length * 20 (number of amino acid).

Training data

  • Take positive DNA sequence and CTCF as positive sample. (labels should be [0, 1])
  • Take negative DNA sequence and CTCF as negative sample. (labels should be [1, 0])
  • Take all DNA sequence and 4 non-DNA binding protein as negative sample.

Testing data

  • Test DNA sequence and CTCF sample.

CTCF_HUMAN.fasta

>sp|P49711|CTCF_HUMAN Transcriptional repressor CTCF OS=Homo sapiens GN=CTCF PE=1 SV=1
MEGDAVEAIVEESETFIKGKERKTYQRRREGGQEEDACHLPQNQTDGGEVVQDVNSSVQM
VMMEQLDPTLLQMKTEVMEGTVAPEAEAAVDDTQIITLQVVNMEEQPINIGELQLVQVPV
PVTVPVATTSVEELQGAYENEVSKEGLAESEPMICHTLPLPEGFQVVKVGANGEVETLEQ
GELPPQEDPSWQKDPDYQPPAKKTKKTKKSKLRYTEEGKDVDVSVYDFEEEQQEGLLSEV
NAEKVVGNMKPPKPTKIKKKGVKKTFQCELCSYTCPRRSNLDRHMKSHTDERPHKCHLCG
RAFRTVTLLRNHLNTHTGTRPHKCPDCDMAFVTSGELVRHRRYKHTHEKPFKCSMCDYAS
VEVSKLKRHIRSHTGERPFQCSLCSYASRDTYKLKRHMRTHSGEKPYECYICHARFTQSG
TMKMHILQKHTENVAKFHCPHCDTVIARKSDLGVHLRKQHSYIEQGKKCRYCDAVFHERY
ALIQHQKSHKNEKRFKCDQCDYACRQERHMIMHKRTHTGEKPYACSHCDKTFRQKQLLDM
HFKRYHDPNFVPAAFVCSKCGKTFTRRNTMARHADNCAGPDGVEGENGGETKKSKRGRKR
KMRSKKEDSSDSENAEPDLDDNEDEEEPAVEIEPEPEPQPVTPAPPPAKKRRGRPPGRTN
QPKQNQPTAIIQVEDQNTGAIENIIVEVKKEPDAEPAEGEEEEAQPAATDAPNGDLTPEM
ILSMMDR

4 Non-DNA binding protein (negative protein)

>1GND:_ GUANINE NUCLEOTIDE DISSOCIATION INHIBITOR
MDEEYDVIVLGTGLTECILSGIMSVNGKKVLHMDRNPYYGGESSSITPLEELYKRFQLLE
GPPETMGRGRDWNVDLIPKFLMANGQLVKMLLYTEVTRYLDFKVVEGSFVYKGGKIYKVP
STETEALASNLMGMFEKRRFRKFLVFVANFDENDPKTFEGVDPQNTSMRDVYRKFDLGQD
VIDFTGHALALYRTDDYLDQPCLETINRIKLYSESLARYGKSPYLYPLYGLGELPQGFAR
LSAIYGGTYMLNKPVDDIIMENGKVVGVKSEGEVARCKQLICDPSYVPDRVRKAGQVIRI
ICILSHPIKNTNDANSCQIIIPQNQVNRKSDIYVCMISYAHNVAAQGKYIAIASTTVETT
DPEKEVEPALELLEPIDQKFVAISDLYEPIDDGSESQVFCSCSYDATTHFETTCNDIKDI
YKRMAGSAFDFENMKRKQNDVFGEADQ
>1PHP:_ 3-PHOSPHOGLYCERATE KINASE (PGK) (E.C.2.7.2.3) - CHAIN _
MNKKTIRDVDVRGKRVFCRVDFNVPMEQGAITDDTRIRAALPTIRYLIEHGAKVILASHL
GRPKGKVVEELRLDAVAKRLGELLERPVAKTNEAVGDEVKAAVDRLNEGDVLLLENVRFY
PGEEKNDPELAKAFAELADLYVNDAFGAAHRAHASTEGIAHYLPAVAGFLMEKELEVLGK
ALSNPDRPFTAIIGGAKVKDKIGVIDNLLEKVDNLIIGGGLAYTFVKALGHDVGKSLLEE
DKIELAKSFMEKAKEKGVRFYMPVDVVVADRFANDANTKVVPIDAIPADWSALDIGPKTR
ELYRDVIRESKLVVWNGPMGVFEMDAFAHGTKAIAEALAEALDTYSVIGGGDSAAAVEKF
GLADKMDHISTGGGASLEFMEGKQLPGVVALEDK
>1LKI:_ LEUKEMIA INHIBITORY FACTOR (LIF) - CHAIN _
SPLPITPVNATCAIRHPCHGNLMNQIKNQLAQLNGSANALFISYYTAQGEPFPNNVEKLC
APNMTDFPSFHGNGTEKTKLVELYRMVAYLSASLTNITRDQKVLNPTAVSLQVKLNATID
VMRGLLSNVLCRLCNKYRVGHVDVPPVPDHSDKEAFQRKKLGCQLLGTYKQVISVVVQAF
>1MRP:_ FERRIC IRON BINDING PROTEIN
DITVYNGQHKEAATAVAKAFEQETGIKVTLNSGKSEQLAGQLKEEGDKTPADVFYTEQTA
TFADLSEAGLLAPISEQTIQQTAQKGVPLAPKKDWIALSGRSRVVVYDHTKLSEKDMEKS
VLDYATPKWKGKIGYVSTSGAFLEQVVALSKMKGDKVALNWLKGLKENGKLYAKNSVALQ
AVENGEVPAALINNYYWYNLAKEKGVENLKSRLYFVRHQDPGALVSYSGAAVLKASKNQA
EAQKFVDFLASKKGQEALVAARAEYPLRADVVSPFNLEPYEKLEAPVVSATTAQDKEHAI
KLIEEAGLK

繳交作業

  • same as HW1
  • Compare the result of HW1 and HW2 which one is better? Is protein sequence helpful?

繳交日期

  • 3/8 中午前.

繳交方式

分組

 組別     成員 accuracy
1 winiel559 ----------
2 chou.yuta ----------
3 wtwang, jason ----------
4 rouanshen, ilunteng ----------
5 bomson, andrewkuo ----------
6 yichun1492 ----------
7 alicetuan ----------
8 jill ----------
9 fish ----------

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published