Updated README.md

EdRey05 · Jan 15, 2024 · eecb091 · eecb091
1 parent 55e6892
commit eecb091
Showing 1 changed file with 85 additions and 52 deletions.
diff --git a/README.md b/README.md
@@ -6,16 +6,15 @@
         <div class="left-column-35">
             <p><strong> Summary </strong></p>
             <p class="justify-text"> During my graduate studies, I helped two bachelor's students working in my research group to do a small 
-                project analyzing survival data from a breast cancer clinical trial. The aim was to evaluate a possible association between 
-                low <strong>gene expression</strong> of two genes at once with <strong>increased patient survival</strong>. <br><br>
+                project analyzing survival data from a breast cancer clinical trial. The aim was to assess any <strong>potential correlation</strong> 
+                between <strong>higher patient survival</strong> and <strong>low expression of two genes simultaneously</strong>. <br><br>
                 For this purpose, we retrieved data publicly available in <strong><a href="https://www.cbioportal.org/">cBioPortal</a></strong>, 
                 and generated <strong>Kaplan-Meier survival curves</strong>. We focused on the <strong>METABRIC</strong> dataset containing 
                 information for over <strong>2,500 patients</strong> and I generated tools in <strong>Jupyter/Colab notebooks</strong> and a 
-                <strong>Streamlit app (see GIF -->)</strong>) to automate the creation of these plots with <strong>Python</strong>. <br><br>
-                At the end of the project, we were able to identify <strong><10 gene pairs</strong> showing the behavior of interest. That 
+                <strong>Streamlit app (see GIF --></strong>) to automate the creation of these plots with <strong>Python</strong>. <br><br>
+                At the end of the project, we were able to identify <strong>less than 10 gene pairs</strong> showing the behavior of interest. That 
                 information was used in combination with other data from different techniques (in-silico and wet-lab) to prioritize further 
-                studies evaluating the effect of inhibtion of those genes in cancer cell models.
-            </p>
+                studies evaluating the effect of inhibtion of those genes in cancer cell models. </p>
         </div>
         <div class="right-column-65">
             <video width="100%" height="auto" autoplay loop muted><source src="Images_GIFs_Videos/Preview_003.mp4" type="video/mp4"></video> 
@@ -30,16 +29,17 @@
                                             </strong>). </li>
                     <li class="justify-text">Each plot required to divide the dataset into <strong>4 groups</strong> to generate <strong>4 survival 
                                             curves</strong> (expression: <strong>low-low, low-high, high-low, high-high</strong>). </li>
-                    <li class="justify-text">The clinical data (<strong>survival times and status</strong>) and the RNA Seq expression data were in
-                                            different datasets with different structure, so pre-processing to both was required before we could map 
-                                            the patient IDs. </li>
+                    <li class="justify-text">The clinical data (<strong>survival times and status</strong>) and the <strong>RNA Seq expression</strong> 
+                                            data were in different datasets that have different structure, so pre-processing to both of them was required 
+                                            before we could map the patient IDs. </li>
                     <li class="justify-text">We needed to screen all the plots generated but keep only the ones where the <strong>low-low curve</strong>
                                             was higher than the others, and retrieve relevant data such as <strong>CIs and time to 50% survival</strong> 
                                             to complement our analysis. </li>
                     <li class="justify-text">Since each clinical trial reports the data in a different way and not all have RNA Seq data, we chose the 
                                             best possible option for breast cancer (<strong><a href="https://www.cbioportal.org/study/summary?id=brca_metabric">METABRIC</a></strong>). </li> 
-                    <li class="justify-text">In order to reuse our code for other breast cancer datasets or even different cancer types, we needed to 
-                                            generalize the workflow as much as possible and make tools for reproducibility and automation. </li>
+                    <li class="justify-text">In order to reuse our code for other breast cancer datasets or even different cancer types, we <strong>needed 
+                                            to generalize the workflow</strong> as much as possible and <strong>make tools for reproducibility and 
+                                            automation</strong>. </li>
                 </ul>
         </div>
         <div class="right-column-50">
@@ -48,15 +48,17 @@
                 <li class="justify-text">I learned how to use the <strong>KaplanMeierFitter</strong> module from the <strong>lifelines</strong> python 
                                         library to generate KM plots. </li>
                 <li class="justify-text">I first generated a <strong>Google Colab notebook</strong> that was dataset-specific to produce batches of 
-                                        <strong>40-50</strong> plots. This exclusively makes 4 groups from the original dataset based on RET and one 
-                                        other gene, which required manual editing of the code to write down all 40-50 names of genes (<a href="https://user-images.githubusercontent.com/62916582/204424020-bae3613c-bf10-4a3b-9d50-beaf50ca8eee.gif" target="_blank">View tool</a>). </li>
+                                        <strong>40-50</strong> plots. This <strong>exclusively makes 4 groups</strong> from the original dataset based 
+                                        on the expression of RET and one other gene, which required to manually write in the code all 40-50 names of the
+                                        other gene (<a href="https://user-images.githubusercontent.com/62916582/204424020-bae3613c-bf10-4a3b-9d50-beaf50ca8eee.gif" target="_blank">View tool</a>). </li>
                 <li class="justify-text">Then, I found a way to generalize some steps and created a <strong>Jupyter notebook</strong> that used
-                                        <strong>ipywidgets</strong> to interactively get user inputs, allowing dynamic selection of any columns
-                                        to divide the dataset into 2 or more groups and re-plotting curves (<a href="https://github.com/EdRey05/Resources_for_Mulligan_Lab/blob/de82796fe821b96c18ab0709018c02c3b02aba92/Tutorials/Preview_Interactive_KM.gif" target="_blank">View tool</a>). </li>
-                <li class="justify-text">Finally, I discovered <strong>Streamlit</strong> and adapted my interactive notebook to a data app (GIF above)
-                                        that used a similar code approach but with more interactivy, improved outputs and better user experience. </li>
-                <li class="justify-text">Although the app works for several different datasets, I noticed high variability in the formatting of clinical
-                                        trial data, and constantly try to improve the app to generalize it more.
+                                        <strong>ipywidgets</strong> to <strong>interactively get user inputs</strong>, allowing dynamic selection of 
+                                        <strong>any measured variable</strong> to divide the dataset into <strong>2 or more groups</strong> and 
+                                        re-plotting curves easily (<a href="https://github.com/EdRey05/Resources_for_Mulligan_Lab/blob/de82796fe821b96c18ab0709018c02c3b02aba92/Tutorials/Preview_Interactive_KM.gif" target="_blank">View tool</a>). </li>
+                <li class="justify-text">Finally, I discovered <strong>Streamlit</strong> and adapted my interactive notebook to a <strong>data app</strong> 
+                                        (GIF above) that used a similar approach but has <strong>more interactivy, improved outputs and better user                                                        experience</strong>. </li>
+                <li class="justify-text">Although the app works well for several datasets, I noticed <strong>high variability in the formatting of clinical
+                                        trial data</strong>, and try to improve my app to generalize it more!. </li>
             </ul>
         </div>
     </div>
@@ -71,35 +73,51 @@
     <div class="two-columns">
         <div class="left-column-35">
             <p><strong> Summary </strong></p>
-            <p class="justify-text"> During my graduate studies, I performed numerous microscopy experiments. These experiments required to acquire sufficient
-                                    images (of cancer cells), process them and analyze them. When I was setting up a new technique in our research group
-                                    called <strong>Proximity Ligation Assay - PLA</strong> to evaluate protein-protein interactions, I was able to use a 
-                                    cell imager (EVOS M7000 - Thermo Fisher) that has automation capabilities. I automated the acquisition of hundreds of 
-                                    fields of view (big images with many cells), and wrote scripts in <code>Jython</code> (Python wrapper for Java) to 
-                                    automate the image pre-processing and analysis in <strong>ImageJ/Fiji</strong>. The final output of my series of scripts
-                                    was a csv file with the quantification result for each image of individual cells (big images where cropped into many 
-                                    smaller). Also, to validate the quantification results I designed a tool to merge all outputs into a summary Power Point
-                                    presentation. I was able to automate the creation of slides, define the layout and the items to insert into each slide
-                                    by using the <code>python-pptx</code> library. I created a tool in Jupyter/Colab notebook version and then a Streamilt 
-                                    app that does the exact same thing but provides a better user interface and additional information. 
+            <p class="justify-text"> During my graduate studies, I performed numerous <strong>fluorescence microscopy experiments</strong>. These 
+                experiments required to acquire sufficient images of individual cancer cells, process them and analyze them. Typical analysis of my 
+                experiments would involve <strong>co-localization</strong> between signals on different channels, or <strong>object/particle detection 
+                and counting</strong>. <br><br>
+                I carried out <strong>Proximity Ligation Assay (PLA)</strong> experiments to evaluate <strong>protein-protein interactions</strong> 
+                using a cell imager (<strong>EVOS M7000 - Thermo Fisher</strong>) that has automated imaging features. I automated the acquisition of 
+                hundreds of images, and wrote scripts in <strong>Jython</strong> (Python wrapper for Java) to automate the image pre-processing and 
+                analysis in <strong>ImageJ/Fiji</strong>. The final output of my series of scripts was a <strong>csv file with the quantification 
+                result for each individual cell</strong> and the <strong>cropped images with their object mask image</strong> (shows colored blobs 
+                which are particles both detected and counted). <br><br>
+                To <strong>validate the quantification results</strong> before proceeding to statistical analysis, I designed a tool to <strong>consolidate 
+                all the outputs</strong> into a summary Power Point presentation. I automated the creation of slides with a <strong>customized layout
+                </strong> and insertion of fluorescence + object mask <strong>image pairs with their names and quantification results</strong> using 
+                the <strong>python-pptx</strong> library. <br><br>
+                I created a tool first in the form of a <strong>Google Colab notebook</strong> and then as a <strong>Streamilt data app</strong> which
+                helped me analyze outputs for <strong>almost 10,000 images</strong>. With this tool, I was able to easily <strong>compare two quantification 
+                methods, test different pre-processing and object detection parameters, and fully optimize the whole workflow for each experiment</strong>.
             </p>
         </div>
         <div class="right-column-65">
             <img src="Images_GIFs_Videos/Preview_002.gif" alt="Streamlit Projects 002 GIF" />
         </div>
     </div>
-    <p class="justify-text"><strong> Problem </strong></p>
-    <p class="justify-text"> Some description here </p>
-    <p class="justify-text"><strong> Solution </strong></p>
-    <p class="justify-text"> Some description here </p>
     <div class="two-columns">
         <div class="left-column-50">
-            <p><strong> Sample cell images and figure </strong></p>
-            <img src="Images_GIFs_Videos/Preview_002_figure1.jpg" alt="002_Sample_data_figure" />
+            <p class="justify-text"><strong> Problem </strong></p>
+            <ul> 
+                <li class="justify-text"> A. </li>
+                <li class="justify-text"> B. </li>
+                <li class="justify-text"> C. </li>
+                <li class="justify-text"> D. </li>
+                <li class="justify-text"> E. </li>
+                <li class="justify-text"> F. </li>
+            </ul>
         </div>
         <div class="right-column-50">
-            <p><strong> First tool </strong></p>
-            <img src="https://user-images.githubusercontent.com/62916582/204415085-cc39bb7c-904e-487c-a16d-0d894c1e3249.gif" alt="002_first_pre_tool" />
+            <p class="justify-text"><strong> Solution </strong></p>
+            <ul> 
+                <li class="justify-text"> A. </li>
+                <li class="justify-text"> B. </li>
+                <li class="justify-text"> C. </li>
+                <li class="justify-text"> D. </li>
+                <li class="justify-text"> E. </li>
+                <li class="justify-text"> F. </li>
+            </ul>
         </div>
     </div>
     <p><strong> Read the instructions and watch another demo of the Streamlit app here: <a href="https://github.com/EdRey05/Streamlit_projects/tree/main/002_Automated_PPTX_PLA">Demo_PPTX_PLA</a></strong></p>
@@ -120,18 +138,28 @@
             <img src="Images_GIFs_Videos/Preview_001.gif" alt="Streamlit Projects 001 GIF" />
         </div>
     </div>
-    <p class="justify-text"><strong> Problem </strong></p>
-    <p class="justify-text"> Some description here </p>
-    <p class="justify-text"><strong> Solution </strong></p>
-    <p class="justify-text"> Some description here </p>
     <div class="two-columns">
         <div class="left-column-50">
-            <p><strong> DepMap website showing the constant updates to the datasets </strong></p>
-            <img src="Images_GIFs_Videos/Preview_001_DepMap_website.jpg" alt="001_DepMap_website" />
+            <p class="justify-text"><strong> Problem </strong></p>
+            <ul> 
+                <li class="justify-text"> A. </li>
+                <li class="justify-text"> B. </li>
+                <li class="justify-text"> C. </li>
+                <li class="justify-text"> D. </li>
+                <li class="justify-text"> E. </li>
+                <li class="justify-text"> F. </li>
+            </ul>
         </div>
         <div class="right-column-50">
-            <p><strong> First tool (used CCLE data from CBioPortals = DepMap 19Q1) </strong></p>
-            <img src="https://user-images.githubusercontent.com/62916582/204422004-47fe5726-d92d-4193-bc6a-ea30b3a93cc1.gif" alt="001_first_pre_tool" />
+            <p class="justify-text"><strong> Solution </strong></p>
+            <ul> 
+                <li class="justify-text"> A. </li>
+                <li class="justify-text"> B. </li>
+                <li class="justify-text"> C. </li>
+                <li class="justify-text"> D. </li>
+                <li class="justify-text"> E. </li>
+                <li class="justify-text"> F. </li>
+            </ul>
         </div>
     </div>
     <p><strong> Read the instructions and watch another demo of the Streamlit app here: <a href="https://github.com/EdRey05/Streamlit_projects/tree/main/001_RNA_expression_DepMap">Demo_RNA_DepMap</a></strong></p>
@@ -144,14 +172,19 @@
 <details><summary markdown="span"> Expand this to read more...</summary>
     <div class="two-columns">
         <div class="left-column-35">
-            <p class="justify-text"> If you have a Github account you can create a Codespace with all the requirements to 
-                run my apps. You only have to log into you account, click on the following button, create your Codespace (we 
-                all have 60h of free usage per month!), and follow the instructions. </p>
-            <a href="https://codespaces.new/EdRey05/Streamlit_projects?quickstart=1" target="_blank">
-            <img src="https://github.com/codespaces/badge.svg" alt="Open in GitHub Codespaces">
-            </a>
+            <br>
+            <p class="justify-text"> If you have a Github account, you can create a <strong>Github Codespace</strong> with all the requirements to 
+                run my apps. You only have to log into you account, click on the button below, create your Codespace (<strong>we all have 60h of 
+                free usage per month!</strong>), and follow the instructions in this video-->.<br>
+                *Note that due to size limits, I did everything quickly but added notes so pause, read and see where I clicked! <br><br></p>
+            <div class="center-text">
+                <a href="https://codespaces.new/EdRey05/Streamlit_projects?quickstart=1" target="_blank">
+                    <img src="https://github.com/codespaces/badge.svg" alt="Open in GitHub Codespaces">
+                </a>
+            </div>
         </div>
         <div class="right-column-65">
+            <br>
             <video width="100%" height="auto" controls><source src="Images_GIFs_Videos/Demo_Codespaces.mp4" type="video/mp4"></video>
         </div>
     </div>