Speaker: Michael A Beer, Department of Biomedical Engineering and McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University
Date: November 2, 2018
Location: Biomed 200
Time: 1:00-2:00pm
Abstract
How DNA sequence specifies cell specific enhancer activity, the promoter targets of these enhancers, and the control of gene expression are all central questions in genomics. Large datasets being generated to address these problems makes machine learning a natural approach. While Deep Neural Networks (DNN) and alternate methods have similar cross-fold validation rates, I will show that variable feature detection and importance leads to significantly different predictions for mutation impact. Since most human population variation arises from weak transcription factor binding site disruption, these differences in machine learning approaches can dramatically affect the accuracy of the predictions. I will also show how shared genomic features lead to dramatic overfitting in a popular machine learning setup to detect enhancer-promoter interactions.