HiLDA - a statistical approach to investigate differences in mutational signatures

We propose a hierarchical latent Dirichlet allocation model (HiLDA) for characterizing somatic mutation data in cancer. The method allows us to infer mutational patterns and their relative frequencies in a set of tumor mutational catalogs and to compare the estimated frequencies between tumor sets.

How to make xaringan slides in R

A tutorial talk on how to make xaringan slides in R

What I've learned from rstudio::2019

A talk on what I've learned from going to the RStudio conference 2019

Learn how to make a header with R hex-stickers for LinkedIn and Twitter

Learn how to make a header with R hex-stickers for LinkedIn and Twitter.

Good practices in R

An example talk on developing good pratices in R

Statistical approach for investigating change in mutational processes during cancer growth and development

An example talk using Academic's Markdown slides feature.

Data visualization

This project contains a list of data visualization generated using R. GitHub tidytuesday repo Click here Tweets It is a little bit tricky to do it. I have to make a duplicated row for the midnight hour (hr = 0) and name it as "hr = 24" so the time (0 6 12 18) will be aligned perfectly. https://t.co/poAzMwHNbX pic.twitter.com/eDr1AVtROt — Zhi Yang, PhD (@zhiiiyang) December 24, 2019 As promised, here are a Sankey diagram and a Twitter-logo-shaped word cloud of the most used hashtags among tweets posted and liked by me in 2019.

E-cig use among youth

This project is focused on studying e-cigarette Product Characteristics and Frequency of Smoking Among Young Adults. Previous research suggests that e-cigarette use is associated with cigarette initiation, including the frequency and heaviness of cigarette smoking. The goal of this research is to examine the impact of specific e-cigarette characteristics on cigarette smoking frequency. What we did is to use advanced counts model to model the association between the e-cigarette exposures and the outcome, the number of cigarette use in the past 30 days.

Colorectal cancer study

During my doctoral training, most of my research is focused on developing novel statistical methods for genomics data, specifically, cancer development and tumor growth. This project has enlighted me to work on several interesting perspectives of it, including hierarchical topics model, clustering, and interactive interface. Thanks to my advisors Kimberly Siegmund, and committe member Paul Marjoram and Darryl Shibata. Motivations Topic models have been widely applied to extract topics from various range of documents or collections of texts, i.