Skip to content

Latest commit

 

History

History
58 lines (40 loc) · 4.48 KB

README.md

File metadata and controls

58 lines (40 loc) · 4.48 KB

Reactive astrocytes in ALS display dysregulated intron retention

This repository contains scripts to analyse the data and reproduce the figures from the paper:

Reactive astrocytes in ALS display diminished intron retention Oliver J. Ziff, Doaa M. Taha, Hamish Crerar, Benjamin E. Clarke, Anob M. Chakrabarti, Gavin Kelly, Jacob Neeves, Giulia Tyzack, Nicholas M. Luscombe, Rickie Patani

The scripts are written in Rmarkdown documents for readability and are organised in order of the Figures in the paper.

All RNA sequencing data generated for this study is deposited at NCBI GEO under accession number GSE160133. RAW Mass Spectrometry data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD022604.

Previously published iPSC derived astrocytes carrying ALS mutations are available at GSE142730 (C9orf72), GSE102902 and GSE99843 (SOD1 mutants and control respectively). Cytokine-stimulated iPSC derived astrocytes are available at syn21861181. TARDBP deleted mouse spinal cord astrocyte specific RNA-seq is available at GSE156542. Mouse SOD1 astrocyte TRAP-seq is available at GSE74724.

For each dataset we analyse:

  1. QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks with nf-core/rnaseq using the MultiQC html output
  2. Alignment to genome with STAR and indexed with SAMtools
  3. Read quantification using HTSeq
  4. Pseudo-alignment to the transcriptome with Kallisto
  5. Differential expression using DESeq2
  6. Intron retention using IRFinder

STAR and samtools were run as follows:

STAR --runThreadN 1 --genomeDir $IDX --readFilesIn $READ1 $READ2 --outFileNamePrefix $OUT/${GROUP} --outFilterMultimapNmax 1 --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx --twopassMode Basic --outSAMstrandField intronMotif
samtools index $OUT/${GROUP}Aligned.sortedByCoord.out.bam

HTSeq was run in intersection-strict mode as follows:

htseq-count --format bam --order pos --mode intersection-strict --stranded reverse --minaqual 1 --type exon --idattr gene_id $FILE $GTF > $OUT/${SAMPLE}.tab

Kallisto quant pseudo-alignment to the transcriptome was run as follows:

kallisto quant -i $INDEX -o $OUT -b 100 --rf-stranded $FASTQ

Differential transcript and gene expression was performed with DESeq2 as per the DESeq2_analysis.Rmd script (see figures folder).

IRFinder reference was built as follows:

IRFinder -m BuildRef -r $REF/Human-hg38-release99 ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.gtf.gz

IRFinder was then run on merged technical repeat fastq files after adaptors had been trimmed using:

IRFinder -r $IRFINDER_REFERENCE -a none -d $OUT/$GROUP $READS

As we had two replicates we utilised the audic and claverie test for differential IR. We first pooled replicates of the same condition using IRFinder_pool_replicates scripts (see splicing folder). We then measured pooled IR in each condition using the IRFinder_differential script (see splicing folder).

Downstream analysis of DESeq2 and IRFinder outputs were then analysis as per the figures_and_tables_resubmission.Rmd script.

Mass spectrometry datasets were analysed with DEP as per the DEP.Rmd script (see figures folder).

Schematics were created using Biorender.com and merged into figures with Adobe Illustrator.