During my doctoral training, most of my research is focused on developing novel statistical methods for genomics data, specifically, cancer development and tumor growth. This project has enlighted me to work on several interesting perspectives of it, including hierarchical topics model, clustering, and interactive interface. Thanks to my advisors Kimberly Siegmund, and committe member Paul Marjoram and Darryl Shibata.
Motivations
Topic models have been widely applied to extract topics from various range of documents or collections of texts, i.e., online customers reviews, medical records, scientific journals, legal documents, books and etc. Its application facilitates the process for us to quickly understand the most featured and commonly shared information embedded texts without actually reading through the entire collection. In addition, topic models also allow us to access the contribution of each topic and its representations across different documents. Human genomes have been exposed to an assortment of mutational processes by contributing to unique patterns of somatic mutations. What would happen if we apply the same concept to the somatic mutations obtained from the cancer patients and look for “topics” of mutations? What would these “topics” tell us about the most important information for our health, genetic, risk factors for cancer and something more that slip under the radar?
News
Ever want to compare mutational signatures between different cancer types? Check out @zhiiiyang 's approach @thePeerJ https://t.co/TbCgXlmwlm with software available @Bioconductor #Bioinformatics #Genomics #Statistic #CancerResearch #USCBiostat
— Kim Siegmund (@KimSiegmund1) August 28, 2019
Six days after the paper being accepted, the package also got accepted to Bioconductor! I have to say the reviewer team truly made them much better. https://t.co/5lkhRlSXVn pic.twitter.com/FR3JPU9W8b
— Zhi Yang, PhD (@zhiiiyang) July 31, 2019
#IMAGEP01 investigators preparing for a great day of science @uscphs #KeckSOM #USC pic.twitter.com/DHxVxwYaqq
— USC Biostatistics (@USCBiostat) June 12, 2019