Skip to content

ahmedtarek26/Top-10-Studied-Genes

Repository files navigation

Top-10-Studied-Genes

Comparative Sequence Analysis of Top 10 Studied Genes

  • Compare their DNA sequence and Protein (Amino Acid) sequence

    • GC Content
    • Analysis of protein sequence
    • Freq of Each Amino Acids
  • Find similarity between them

    • Alignment
    • hamming distance
  • 3D structure of each

The idea

An article from nature

Datasource

Fasta Files

Comparative Analysis of Top 10 Genes

GC Contents In DNA

  • GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C)

Usefulness

  • In polymerase chain reaction (PCR) experiments, the GC-content of short oligonucleotides known as primers is often used to predict their annealing temperature to the template DNA.
  • A higher GC-content level indicates a relatively higher melting temperature.
  • DNA with low GC-content is less stable than DNA with high GC-content AKT1 have the most number of GC -> 73.07% then APOE with 65.06%

Protein Sequence analysis

ESR1 have the most protein length with 158.49k

EGFR protein length 64.204k

Check for the Count of Amino Acids

Most Common amino acids

  • TP53 -> (L, 674), (S, 616), (G, 477), (P, 464), (R, 421)

  • TNF -> (L, 96), (S, 91), (G, 79), (R, 73), (P, 72)

  • EGFR -> (L, 6982), (S, 5916), (P, 3914), (G, 3695), (V, 3457)

  • VEGFA -> (L, 589), (G, 577), (S, 567), (P, 541), (R, 404)

  • APOE -> (G, 131), (P, 130), (S, 127), (R, 119), (A, 117)

  • IL6 -> (L, 226), (S, 191), (P, 141), (G, 130), (K, 125)

  • TGFB1 -> (L, 806), (S, 749), (G, 659), (P, 637), (R, 597)

  • MTHFR -> (L, 715), (S, 649), (G, 609), (P, 603), (A, 508)

  • ESR1 -> (L, 18225), (S, 14142), (F, 10419), (I, 9754), (*, 8724)

  • AKT1 -> (G, 1211), (P, 1005), (L, 846), (A, 801), (R, 720)

Find similarity between them

Narative

  • similarity between APOE and IL6 is 78.54%
  • similarity between TNF and APOE is 71%
  • similarity between MTHFR and VEGFA is 70.9%
  • similarity between TGFB1 and AKT1 is 67%
  • similarity between TP53 and MTHFR is 66.5%
  • similarity between TP53 and VEGFA is 0.35%

similarity between EGFR and ESR1 for the first 35000 is 63.82%

3D Structure

you can see them in the code

About

Comparative Sequence Analysis of Top 10 Studied Genes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published