Streaming data into H2o

Not something that appears widely publicised, which in my view is strange given the real-time world we live in these days, but after some web searching I found a relevant article on the H2O World 2015 Training site.

  • databrick streaming
  • Diving into Spark Streaming’s Execution Model
  • Improvements to Kafka integration of Spark Streaming
  • How-to: Build a Machine-Learning App Using Sparkling Water and Apache Spark

Real-time Predictions With H2O on Storm isn’t quite what I was looking for, I’m more interested in Apache Kafka.  However, its close enough to the problem to be useful 🙂

Source code can be found here. is the fake real-time data. is the interesting code, which consumes the real-time data, and pushing results out via the collector emit() and ack() functions.

PredictionBolt emits tuples via the collector to ClassifierBolt who write the data to a file (out) and also emits the classification result to the collector.

The cheat is that ClassifierBolt write to a file (out), which is read by the JS code.  In reality the results from ClassifierBolt should probably go via a websocket to the HTML user interface.

Net out, using Storm, Kafka or other technology is irrelevant, the key is exporting the model as a Java POJO via R, and h2o.download_pojo().


~ by mdavey on April 20, 2016.

One Response to “Streaming data into H2o”

  1. […] Source: Streaming data into H2o […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: