Removal of tissue contaminations from RNA-seq data

Removal of tissue contaminations from RNA-seq data

Project details

Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA-seq data, especially when data come from large and complex studies. The main goal of RNA-seq normalisation is to effectively remove unwanted variation that can compromise downstream analyses, while preserving biological variation. It has been recently shown that tissue contaminations affect a large sample of widely used data base such as Genotype-Tissue Expression (GTEx) (Tim O. Nieuwenhuis,, Nature Communications, 2020).  

We aim to develop and adopt our RUV-III-PRPS normalisation method (R Molania. et,al, bioRxiv, 2022 (Accepted in Nature Biotechnology)) to be able to remove tissue contamination from RNA-seq data. In this project the student will learn how to (a) work with big RNA-seq data, (b) identify and quantify different sources of unwanted variation and (c) normalise and assess the performance of RNA-seq normalisations.

About our research group

We have been applying and developing novel analytical methods for normalising and integrating large genomics data for many years. In our recent work, which has been accepted in Nature Biotechnology, we have identified batch effects in the TCGA data with 33 cancer types and more than 11000 RNA-seq samples. We have developed RUV-III-PRPS method that can normalise RNA-seq data for library size, batch effects and tumour purity variation. We have developed Rshiny and R package for normalisation of the TCGA RNA-seq data. We are now developing fast RUV-III-PRPS that can normalise scRNA-seq data with one million cells in less than two minutes. 


Email supervisors



Photo of Dr Ramyar Molania

Professor Tony Papenfuss

Tony Papenfuss
Laboratory Head; Leader, Computational Biology Theme

Project Type: