Machine Learning on Streaming Data – Samza and Flink
Based on a few comments, coupled with various web reading, I get the impression Spark and Storm are not the latest solution to use in a Streaming Data Machine Learning platform – maybe I’m wrong? Apache Samza and Flink appear to be the new kids on the block. There are a few comparisons of the various streaming engines – one here, and another here.
Samza is very interesting, since it uses a technology I like a lot, Apache Kafka 🙂 Flink however appears to be the newest kid on the block 🙂 , and based on this simple code comparison, offers a clean API.
dataArtisans “Kafka + Flink: A practical, how-to guide” article offer some direction on connecting Kafka and Flink, which in many ways might be the approach to take to running Machine Learning models against streaming data.
Finally, although old, “Apache Flink: API, runtime, and project roadmap” slide 62 provide a view of the roadmap for Flink to integrate with Machine Learning libraries – also slide 67, with H2O mentioned on slide 71.
The next bus: Apache Kudu?