Streaming data into H2o


Not something that appears widely publicised, which in my view is strange given the real-time world we live in these days, but after some web searching I found a relevant article on the H2O World 2015 Training site.

  • databrick streaming
  • Diving into Spark Streaming’s Execution Model
  • Improvements to Kafka integration of Spark Streaming
  • How-to: Build a Machine-Learning App Using Sparkling Water and Apache Spark

Real-time Predictions With H2O on Storm isn’t quite what I was looking for, I’m more interested in Apache Kafka.  However, its close enough to the problem to be useful🙂

Source code can be found here. TestH2ODataSpout.java is the fake real-time data.  H2OStormStarter.java is the interesting code, which consumes the real-time data, and pushing results out via the collector emit() and ack() functions.

PredictionBolt emits tuples via the collector to ClassifierBolt who write the data to a file (out) and also emits the classification result to the collector.

The cheat is that ClassifierBolt write to a file (out), which is read by the JS code.  In reality the results from ClassifierBolt should probably go via a websocket to the HTML user interface.

Net out, using Storm, Kafka or other technology is irrelevant, the key is exporting the model as a Java POJO via R, and h2o.download_pojo().

~ by mdavey on April 20, 2016.

One Response to “Streaming data into H2o”

  1. […] Source: Streaming data into H2o […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: