Real-time Data Refinery

First, I must give credit to Hortonworks for a “Data Refinery” cool data buzzword. Hortonworks “Storm and Kafka Together: A Real-time Data Refinery” article provide a great overview of data process, and why Storm and Kafka work so well together:

Apache Storm is a distributed real-time computation engine that reliably processes unbounded streams of data. While Storm processes stream data at scale, Apache Kafka processes messages at scale. Kafka is a distributed pub-sub real-time messaging system that provides strong durability and fault tolerance guarantees.

Tutorial available here to get your hands dirty with Storm and Kafka.

Along similar lines, “Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform” provides further reading material.

Finally, “Real Time Streaming with Apache Storm and Apache Kafka” offers the classic Twitter Stream Sentiment Analysis.

~ by mdavey on January 21, 2016.

Posted in Data

One Response to “Real-time Data Refinery”

[…] on from an earlier posting on Data Refineries, I’m now beginning to consider connecting Kafka (my distributed commit-log) to H2O. H2O […]

Feeding a Data Refinery with Kafka | Tales from a Trading Desk said this on March 11, 2016 at 1:55 pm | Reply

Tales from a Trading Desk