Background
I came across H2O several times in the past couple of years but I didn’t get a chance to try it out by myself. Until very recently, I went to the SatRday Los Angeles conference 2019 at UCLA this year where Erin LeDell (@ledell) gave a keynote talk on scalable automatic machine learning with H2O. The elegant way she has navigated the model building process has fascinated me. Most importantly, H2O also has a very nice and friendly interface called H2O flow to users to quickly familiarize themselves with this tool. Here is a more detailed and official documentation of how to use H2O flow. But I’d like to highlight my favorite features.
Let’s get started
First, go to the H2O website to download the zip file.
Follow the instruction to unzip the file and run it from your terminal.
cd Downloads unzip h2o-3.24.0.2.zip cd h2o-3.24.0.2 java -jar h2o.jar
Then type http://localhost:54321 in your browser.
Feature highlight
Notebook layout
The feature that I appreciate the most is the similar layout as the Jupyter notebook. By adopting the style of the existing tools, it saves the users time from learning using the new interface to really starting to adopt this new tool.
Examples
Without too much efforts, new users can directly jump into the pipeline by opening the existing projects, i.e., deep learning, GBM, XGbBoost and etc.
Control Panel Examples Running time
Another noticable feature of the flow is that users have a better sense of time cost of each component in the pipeline. The running time of each block is shown by default, which also improves the users engagement from using this new tool.
Model diagnostics tools
After the models are implemented and performed, the H2O flow generates a wide range of output for users to select from. Here are some examples from a project using GBM for airline classificaiton.
Scoring history ROC curve Variable importances
If you would like to know more, here is an article on Medium about democratising machine learning with H2O, which also introduces all the available H2O products including the driverless ai. You can also find a list of publications using H2O here.