Molecular trans-regulatory programs comprised of cell signalling, transcriptional and translational networks are central to health and disease. To this end, we combine machine learning and statistical methods to model trans-regulatory programs in stem and progenitor cells using large-scale omics data (trans-omics).
Our research vision is twofold.
We are a systems biology group cross-trained in computer science, statistics, and molecular biology. Our research lies at the interface of bioinformatics and systems biology. We develop computational and statistical models to reconstruct signalling cascades, epigenomics, transcriptional, and proteome networks and characterise their cross-talk and trans-regulations in various cellular processes, systems, and in disease states.
By integrating heterogeneous omics data with the goal of generating testable hypotheses and predictions, our work contributes to the comprehensive understanding of trans-omic networks that underlie cellular homeostasis, proliferation, differentiation, cell-fate decisions, and their malfunctions that lead to the development of various complex diseases.
This is a major initiative in our group to integrate trans-omics datasets generated from mouse embryonic stem cells (mESCs) differentiation process. The goal is to investigate the cross-talk of signalling cascades, epigenomic, transcriptomic and proteomic regulations, and their feedback regulations. The high-throughput data that we are seeking to integrate include time-course mass spectrometry-based proteomics and phosphoproteomics, and next-generation sequencing-based RNA-seq and ChIP-seq data.
We are working closely with Professor Jean Yang's group on mixture modelling from single-cell RNA-seq (scRNA-seq) data. This includes developing novel statistical models to capture various unique aspects in scRNA-seq data. We are applying our model to understand cell differentiation and tissue development processes in human and mouse.
We have recently developed an adaptive sampling approach (AdaSampling) for learning from positive-unlabeled dataset. We are extending this semi-supervised learning approach for the identification of transcription factor target genes in differentiating mouse embryonic stem cells (mESCs) by integrating transcriptomics, proteomics, and epigenomics data.