An interactive treemap of media franchise revenue by category

Background

😄 Hi, you are probably directed here because of my recent tweet. To practice my data visualization skills, I decided to enter the this week’s #TidyTuesday challenge, which is about how different media franchises stack up with their revenue streams. You can read more about the dataset here.

Data visualization

library(dplyr)
library(ggplot2)
library(d3treeR)
library(treemap)
library(RColorBrewer)
library(stringr)

dat <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-07-02/media_franchises.csv")

head(dat, 3)
##                                   franchise         revenue_category revenue
## 1 A Song of Ice and Fire /  Game of Thrones               Book sales   0.900
## 2 A Song of Ice and Fire /  Game of Thrones               Box Office   0.001
## 3 A Song of Ice and Fire /  Game of Thrones Home Video/Entertainment   0.280
##   year_created original_media            creators
## 1         1996          Novel George R. R. Martin
## 2         1996          Novel George R. R. Martin
## 3         1996          Novel George R. R. Martin
##                            owners
## 1 Random House WarnerMedia (AT&T)
## 2 Random House WarnerMedia (AT&T)
## 3 Random House WarnerMedia (AT&T)
dat2 <- dat[!duplicated(dat), ]

dat2 %>% group_by(revenue_category) %>% summarise(n())
## # A tibble: 8 x 2
##   revenue_category                `n()`
##   <fct>                           <int>
## 1 Book sales                          7
## 2 Box Office                         76
## 3 Comic or Manga                     27
## 4 Home Video/Entertainment           64
## 5 Merchandise, Licensing & Retail    73
## 6 Music                              12
## 7 TV                                 11
## 8 Video Games/Games                  51

By briefly looking at the data, I realized that it has several categorical variables, one of which might have too many distinct levels. Therefore, using the traditional barplot might not be a good idea here unless only a subset of the data is shown here. So, it occurred to me that the treemap might be a good idea. So far, treemap is very handy and straightforward to use compared to other similar R packages.

treemap(
   dat2,
   index=c("revenue_category", "franchise"),
   vSize="revenue",
   vColor="revenue",
   type="value",
)

Basically, you can get a very pretty treemap with a few line of codes. However, it doesn’t really differentiate each revenue category. Wouldn’t it be nice that each revenue category gets its own color that changes along with revenue? Intuitively, it doesn’t seem a difficult task to implement. But the answer is not trivial!

treemap(
   dat2,
   index=c("revenue_category", "franchise"),
   vSize="revenue",
   vColor="revenue",
   type="index",
)

If I changed type argument to be index, it seems that every category has its own color but the color doesn’t coordinate well with the revenue change. I also gained nothing by going through the source codes of treemap packages. If someone knows how the color changes with this setting, PLEASE LET ME KNOW! So, I am going to show you a chunk of not that elegant event a little bit daunting codes.

dat3 <- dat2 %>% arrange(revenue_category, revenue)  %>%
  group_by(revenue_category) %>%
  mutate(bin = cut(revenue, 
                   breaks = c(-Inf, 
                              quantile(revenue, 
                                       probs = seq(0.25, 0.75, 0.25)), 
                              Inf), 
                   labels = c(1, 2, 3, 4)))

dat3$newbin <- with(dat3, interaction(revenue_category,  bin)) 

dat3$newbin <- factor(dat3$newbin, as.character(unique(dat3$newbin)))

dat3 %>% group_by(revenue_category, bin) %>% select(newbin)
## Adding missing grouping variables: `revenue_category`, `bin`
## # A tibble: 321 x 3
## # Groups:   revenue_category, bin [31]
##    revenue_category bin   newbin      
##    <fct>            <fct> <fct>       
##  1 Book sales       1     Book sales.1
##  2 Book sales       1     Book sales.1
##  3 Book sales       2     Book sales.2
##  4 Book sales       2     Book sales.2
##  5 Book sales       2     Book sales.2
##  6 Book sales       4     Book sales.4
##  7 Book sales       4     Book sales.4
##  8 Box Office       1     Box Office.1
##  9 Box Office       1     Box Office.1
## 10 Box Office       1     Box Office.1
## # … with 311 more rows
counts <- dat3 %>% group_by(revenue_category) %>% 
   summarise(n = n_distinct(bin)) %>% pull(n)


palette <- sapply(1:n_distinct(dat3$revenue_category), 
        function(i) brewer.pal(counts[i], c("Greys", "Reds", "Oranges", 
                    "RdYlBu", "Blues", "Purples", "PuRd", "Greens")[i])) %>% 
           unlist()

Basically, we have to manually create a factor variable, bin, that categorizes the revenue by each category. Then, we combine bin and revenue_category to be a new factor variable, newbin, which is assigned with different color accordingly.

tree <- treemap(
   dat3,
   index=c("revenue_category", "franchise"),
   vSize="revenue",
   vColor="newbin",
   type="categorical",
   position.legend  ="none",
   palette = palette
)

Looks exactly what we wanted! You think it is one step away from making it into an interactive map? Nooooooo!

Interactive treemap

Apparently, d3tree function from the d3treeR package doesn’t take in any unusual characters like & or ō. Please don’t ask me how I found out. Therefore, we need to replace this two characters with something compatible with d3tree.

dat3$franchise <- str_replace_all(dat3$franchise, "[&]", "and")
dat3 <- dat3 %>% 
   mutate(franchise = 
            ifelse(is.na(str_match(franchise, "Jump Comics"))==FALSE, 
                   "ohonen Jump / Jump Comics", franchise))

dat3$revenue_category <- factor(dat3$revenue_category)

dat3$revenue_category <- recode(dat3$revenue_category, 
      `Merchandise, Licensing & Retail` = "Merchandise, Licensing and Retail")

Now, we are finally ready for making the interactive map. Simply copying the code from treemap into d3tree.

treenew <- treemap(
   dat3,
   index=c("revenue_category", "franchise"),
   vSize="revenue",
   vColor="newbin",
   type="categorical",
   position.legend  ="none",
   palette = palette
)

d3tree(treenew, rootname = "Revenue by category")

If you’d like to change the font size or style, please source this function style_widget from link.

style_widget <- function(hw=NULL, style="", addl_selector="") {
  stopifnot(!is.null(hw), inherits(hw, "htmlwidget"))

  # use current id of htmlwidget if already specified
  elementId <- hw$elementId
  if(is.null(elementId)) {
    # borrow htmlwidgets unique id creator
    elementId <- sprintf(
      'htmlwidget-%s',
      htmlwidgets:::createWidgetId()
    )
    hw$elementId <- elementId
  }

  htmlwidgets::prependContent(
    hw,
    htmltools::tags$style(
      sprintf(
        "#%s %s {%s}",
        elementId,
        addl_selector,
        style
      )
    )
  )
}

Currently, you can choose from three kinds of styles from d3tree, d3tree2, d3tree3.

style_widget(
  d3tree(treenew, rootname = "Revenue by category"),
  addl_selector="text",
  style="font-family:cursive; font-size:10px;"
)
style_widget(
  d3tree2(treenew, rootname = "Revenue by category"),
  addl_selector="text",
  style="font-family:cursive; font-size:10px;"
)
style_widget(
  d3tree3(treenew, rootname = "Revenue by category"),
  addl_selector="text",
  style="font-family:cursive; font-size:10px;"
)

I also have some news to share with you. Someone saw my last tweet of treemaps and reached out to me for a position at a healthcare consulting company. And I evetually got to the point that they would like to consider me for that position.

Avatar
Zhi Yang
PhD in Biostatistics

Related

comments powered by Disqus