Hsin-Ta was the first student accepted into the program during our first year and we are proud to announce he is now Dr. Hsin-Ta Wu, Ph.D. We wish you success in all your future endeavors.
Hsin-Ta Wu successfully defended his thesis on Tuesday, May 10th to become our second awarded Doctorate in the Computational Biology Ph.D. program.
Cancer is caused largely by the accumulation of somatic mutations during the lifetime of an individual. Recent advances in next generation sequencing (NGS) enable measurement of somatic mutations in a cohort of samples. Large-scale cancer sequencing projects like The Cancer Genome Atlas (TCGA) have generated a huge amount of somatic mutations in thousands of tumors. This thesis addresses two challenges. The first challenge is to distinguish driver mutations that are responsible for cancer development from passenger mutations, random events that do not contribute to the cancer phenotype in a cohort of samples. This is a difficult problem because most somatic mutations measured in tumor samples are passenger mutations, and only a small portion of these mutations are driver mutations. The second challenge is to accurately identify larger genomic variants, also known as structural variants (SV), one type of somatic mutations that alters normal gene function in tumors. Although several computational methods have been proposed to address this challenge using NGS technologies, they are limited by the underlying NGS data, resulting in a significant amount of SVs remaining undetectable, particularly in highly repetitive regions of the genome.
To address the first challenge, we present two computational methods: discovering recurrent mutations and identifying combinations of mutations. First, we introduce a combinatorial approach for the problem of identifying independent and recurrent somatic copy number aberrations, which are gains and losses of large segments of the genome ranging in size from a few kilo-bases to whole chromosomes. We show that our method outperforms other methods on simulated data and also performs well on three cancer data sets from TCGA. Second, we introduce a statistical model to identify combinations of driver mutations de novo, without any prior biological knowledge. Our model searches for combinations of mutations that exhibit mutual exclusivity, a pattern expected for mutations in pathways. Our model is more sensitive in detecting combinations of lower frequency mutations and outperforms other methods on simulated and real data.