My experience with using H2O Flow

A web-based interactive notebook-like computational environment.

My experience with using H2O Flow

A web-based interactive notebook-like computational environment.


I came across H2O several times in the past couple of years but I didn’t get a chance to try it out by myself. Until very recently, I went to the SatRday Los Angeles conference 2019 at UCLA this year where Erin LeDell (@ledell) gave a keynote talk on scalable automatic machine learning with H2O. The elegant way she has navigated the model building process has fascinated me. Most importantly, H2O also has a very nice and friendly interface called H2O flow to users to quickly familiarize themselves with this tool. Here is a more detailed and official documentation of how to use H2O flow. But I’d like to highlight my favorite features.

Let’s get started

  1. First, go to the H2O website to download the zip file.

  2. Follow the instruction to unzip the file and run it from your terminal.

    cd Downloads
    cd h2o-
    java -jar h2o.jar
  3. Then type http://localhost:54321 in your browser.

Feature highlight

  1. Notebook layout

    The feature that I appreciate the most is the similar layout as the Jupyter notebook. By adopting the style of the existing tools, it saves the users time from learning using the new interface to really starting to adopt this new tool.

  2. Examples

    Without too much efforts, new users can directly jump into the pipeline by opening the existing projects, i.e., deep learning, GBM, XGbBoost and etc.

    Control Panel Examples
  3. Running time

    Another noticable feature of the flow is that users have a better sense of time cost of each component in the pipeline. The running time of each block is shown by default, which also improves the users engagement from using this new tool.

  4. Model diagnostics tools

    After the models are implemented and performed, the H2O flow generates a wide range of output for users to select from. Here are some examples from a project using GBM for airline classificaiton.

    Scoring history ROC curve Variable importances

If you would like to know more, here is an article on Medium about democratising machine learning with H2O, which also introduces all the available H2O products including the driverless ai. You can also find a list of publications using H2O here.

Zhi Yang
Senior Manager, Biostatistics
comments powered by Disqus