The production of diverse types of high-dimensional and high-throughput biological data has increased tremendously in the last decade, presenting novel opportunities to develop and apply computational and machine learning approaches to understand the genetics of human diseases. However, the high dimensionality of this data, whereby up to millions of diverse and heterogeneous “features” are measured in a single experiment, coupled with the prevalence of systematic confounding factors present significant challenges in disentangling bona fide associations that are informative of causal molecular events in disease.

The research program in my lab focuses on designing tailored computational models and algorithms for integrating multiple types of high-dimensional “omics” data, with the ultimate goal of disentangling meaningful molecular correlations for common diseases such as psychiatric disorders and cancers.


Genetic variants in Alzheimer disease — molecular and brain network approaches
Nature Reviews Neurology
Chris Gaiteri and Sara Mostafavi and Christopher J. Honey and Philip L. De Jager and David A. Bennett
DOI: 10.1038/nrneurol.2016.84

Allele-specific expression reveals interactions between genetic variation and environment
David A Knowles and Joe R Davis and Anil Raj and Xiaowei Zhu and James B Potash and Myrna M Weissman and Jianxin Shi and Douglas F Levinson and Sara Mostafavi and Stephen B Montgomery and Alexis Battle
DOI: 10.1101/025874

Sharing and Specificity of Co-expression Networks across 35 Human Tissues
PLOS Computational Biology
Emma Pierson and Daphne Koller and Alexis Battle and Sara Mostafavi and the GTEx Consortium
DOI: 10.1371/journal.pcbi.1004220

Sharing and specificity of co-expression networks across 35 human tissues
Emma Pierson and GTEx Consortium and Daphne Koller and Alexis Battle and Sara Mostafavi
DOI: 10.1101/010843

Normalizing RNA-Sequencing Data by Modeling Hidden Covariates with Prior Knowledge
Sara Mostafavi and Alexis Battle and Xiaowei Zhu and Alexander E. Urban and Douglas Levinson and Stephen B. Montgomery and Daphne Koller
DOI: 10.1371/journal.pone.0068141

Combining many interaction networks to predict gene function and analyze gene lists
Sara Mostafavi and Quaid Morris
DOI: 10.1002/pmic.201100607

Labeling Nodes Using Three Degrees of Propagation
Sara Mostafavi and Anna Goldenberg and Quaid Morris
DOI: 10.1371/journal.pone.0051947

Predicting Node Characteristics from Molecular Networks
Network Biology
Sara Mostafavi and Anna Goldenberg and Quaid Morris
DOI: 10.1007/978-1-61779-276-2_20


Current Projects
The research program in the Mostafavi lab focuses on developing and applying computational and machine learning approaches for integrating and interpreting high-dimensional genomics data. In particular, three ongoing research projects are summarized below.

(a) Integrating multiple data types for understanding the genetics of complex traits: Common diseases are multifactorial with contributions from multiple genetic and environmental factors. The availability of multiple types of genomics data (e.g., genome, methylome, transcriptome, and proteome) now allow us to build a comprehensive understanding of varied types of risk factors that underlie complex diseases. For example, the combination of genotyping and epigenomic data can summarize the effect of genetic factors, environmental factors, and interactions between the genetic and the environmental at the cellular level. To this end, we are developing computational models that integrate multiple types of genomics data in the context of complex diseases, with the goal of disentangling meaningful, and likely causal, from merely correlated or downstream factors.

(b) Understanding the impact of genetic variation on cellular traits: In order to understand how genetic variation results in disease, we must first understand the impact of such variation on cellular traits. We are interested in developing predictive computational models for linking genetic variation to multiple types of cellular traits, including histone modification, gene and protein expression levels. In particular, we are working to develop approaches that take into account tissue- ans cell-specificity when making such predictions.

(c) Predicting gene function from heterogeneous data sources: A major goal in molecular biology is to determine the functional role of all genes and proteins in a cell. Our current knowledge of gene function is limited: majority of human genes (or proteins) have not yet been associated with an informative function(s). With the availability of large and diverse types of genomic data, we can now make rapid progress in this domain. We are developing and applying computational approaches for integrating multiple types of genomics data in order to predict the function of uncharacterized genes in a genome-wide and context-specific manner.

Honours & Awards

CIFAR Award (2017)

Research Group Members

William Casazza, PhD Student
Bernard Ng, Research Associate
Mike Vermeulen, Msc
Yichen Zhang, Graduate Research Asssistant