Machine Learning on Streaming Data – Samza and Flink

Based on a few comments, coupled with various web reading, I get the impression Spark and Storm are not the latest solution to use in a Streaming Data Machine Learning platform – maybe I’m wrong?  Apache Samza and Flink appear to be the new kids on the block.  There are a few comparisons of the various streaming engines – one here, and another here.

Samza is very interesting, since it uses a technology I like a lot, Apache Kafka 🙂  Flink however appears to be the newest kid on the block 🙂 , and based on this simple code comparison, offers a clean API.

dataArtisans “Kafka + Flink: A practical, how-to guide” article offer some direction on connecting Kafka and Flink, which in many ways might be the approach to take to running Machine Learning models against streaming data.

Finally, although old, “Apache Flink: API, runtime, and project roadmap” slide 62 provide a view of the roadmap for Flink to integrate with Machine Learning libraries – also slide 67, with H2O mentioned on slide 71.

The next bus: Apache Kudu?


~ by mdavey on April 28, 2016.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: