A new paradigm of machine learning-based structural variant detection

A new paradigm of machine learning-based structural variant detection

Project details

Structural variants (SVs) are large-scale chromosomal changes, which include insertions, deletions, duplications, inversions and translocations, as well as large-scale catastrophic rearrangements of chromosomes. These mutations can drive cancers and affect response to therapy. 

SV calling is challenging (e.g. Cameron, Nat Commun 2019, 10:3240) and many methods have been developed to identify SVs, but these were often hand-tuned on a small number of reference data sets and need further tuning by users for their data. This leads to highly variable performance. Integration of results from different callers and from short- and long-read technologies is another roadblock. 

This project will develop an exciting and potentially highly-impactful new approach to machine learning-based SV-calling using short- and long-read technology that is being pioneered in the lab. We have prototyped this method and demonstrated that it works well on single samples, but requires more research to make practical and adapt to the harder problem of somatic SV calling in cancer and to short- and long-read calling.

About our research group

The Papenfuss lab undertakes computational biology and bioinformatics research in the Bioinformatics division at WEHI. We develop and apply mathematical, statistical and computational approaches to make sense of different types of omics data from cancer in order to drive discoveries. A key focus of the lab understanding the molecular changes in cancers as they develop and progress. 

The lab has developed some of the leading SV calling methods (e.g. Schroeder, Bioinformatics 2014; Cameron, Genome Research 2017; Cameron, Genome Biology 2021) and applied these to fascinating datasets (e.g. Garsed, Cancer Cell 2014; Vergara, Nature Comm 2021). We have collected a high-quality set of reference datasets for methods development and lead the analysis of several unique patient cohorts. 


Email supervisors



Professor Tony Papenfuss

Tony Papenfuss
Laboratory Head; Leader, Computational Biology Theme
Dr Justin Bedo
Bioinformatics division

Project Type: