Skip to content
Cristina Tuñí i Domínguez edited this page Sep 14, 2020 · 3 revisions

Welcome to my Master's Thesis wiki!

Pipeline

This files create a GUI which run a tRNA alignment pipeline made by Marina Murillo. The GUI and other functions are made by me, Cristina Tuñí.

Running it in Unix based systems

Requirements

The pipeline is designed to automatize the other requirements (ex. installation of packages and programs). If the automation step fails, one must manually install and add to PATH:

This can be done by installing Anaconda, and through bioconda those three programs can be installed. R can be installed manually by following the instructions found in the following link .

And also pysam package for python.

This automation step is carried out by anaconda_setup.sh. If one encounters problems with the automation step, the commands in that script can be executed one by one to manually reach the final step of the requirements.

The installation and/or importation of python modules is carried out by modules.py. Like in the previous paragraph, all of those python packages can be installed manually using conda or the installer preferred by the user.

Running it in Windows

Requirements

First of all, it is important for the user to know that bowtie2, the aligner this pipeline is based on, is only designed to work on Unix based systems like Ubuntu or MacOS. So a workaround had to be found. Before running the pipeline, the user must install Windows Subsystem for Linux (WSL from now on). WSL allows Unix based orders to work on Windows, thus making possible for the Windows user to run bowtie2 among other "Unix exclusive" programs.

The user also must install and add to PATH Python 3.x and also R.

Once this two steps are fulfilled, the automation step should work in the same way as in Unix based systems. If it does not, one can run the commands step by step found in the anaconda_setup.sh in the WSL window.

Optional steps

Before running the pipeline, the user can choose to fulfill some optional steps, that do not have anything to do with the pipeline itself. Some of this steps allow for the pipeline to be used, but once done one time it is not necessary to re-run them:

  • First, one can install all of the required packages by clicking the button "Download additional programs".

This will try to automatically make sure that all of the requirements are fulfilled. A console will open and the user must follow the instructions there. If Anaconda is not installed, their License Agreement will show up and the user must press "ENTER" until the end of the agreement, then write "yes" to accept it, press "ENTER" again to confirm the path where Anaconda will be installed, and we recommend writing "yes" again when prompted, which will make the conda installer available.

This will install Anaconda and the three programs required to run the pipeline. If Anaconda is already installed, this step may fail.

  • Users can also download the Human Genome and the bowtie2 indexes required to run the pipeline.

By clicking the button "Download Genome", one will automatically download the genome and the alignment index and the annotation found here. ATTENTION: This file weighs 14Gb approximately, when compressed. Make sure you have enough disk space and a steady internet connection.

  • Fastq files can also be downloaded from the program.

Users can input an accession code in the form of SRRXXXXXXX (for example, SRR7216347). This will access the FTP server of EBI-ENA and download a compressed fastq file. The user can choose to uncompress it (once downloaded, not before) by clicking the button "Untar and delete .gz file". This will delete the compressed file, and keep the normal and usable .fastq file.

Launching it

With the terminal in Unix and CMD in Windows, move to the folder where the code is stored (cd /path/to/TFM/Scripts) and write:

  • python3 GUI.py

Usage

Once the user has:

  • Fulfilled the requirements to run the pipeline.
  • Downloaded the Human genome.
  • Downloaded and uncompressed the fastq files to analyze.

The pipeline can finally be run! To do this, the user must click on the button "Choose folder" and select the folder where the fastq files are stored. Then, they must click the button "Choose file", and select the file to analyze.

Once this is done, by clicking the "Submit" button at the bottom of the app window, the pipeline will begin to run. This pipeline can take up several hours to complete, and it creates several heavy files, so we recommend two things: running in a powerful enough machine, and making sure one has enough disk space.

Once the pipeline has stopped running, it will have generated several result files. In order to make the analysis step of this result files easier, an R analysis is implemented too.

There is a Results folder in this repository that contains a sample of the data that can be analyzed with R.

Analyzing the results with R

First of all, the user must click on the button "Join results for R". This will move the counts results from their directory to a new one, and then merge them in only one file. Next, the user must create a samples_data.txt file which will contain the name of the samples and whether they are Control samples or not. The aspect of this file is the same as the one with the same name in the Results folder, put there as an example.

Once this is done, the next step is different for Unix and Windows users:

  • For Unix users just click on the button "DEG analysis with R". This will launch the R script and produce a series of tables and graphs in the Results folder.
  • For Windows users, close the window of the GUI and write in the console Rscript DEG_analysis.r

If this step fails, make sure that the conditions of the experiment (control, treated, etc.) are the same as the ones in line 80 of the R script.

If this step works, the user will obtain the series of tables and graphs that can be found in the results folder, as a sample. Once the pipeline is downloaded into one's computer, the whole Results folder can be deleted, since the program will create it again when it runs with user data.