Real-time Data Refinery


First, I must give credit to Hortonworks for a “Data Refinery” cool data buzzword.  Hortonworks “Storm and Kafka Together: A Real-time Data Refinery” article provide a great overview of data process, and why Storm and Kafka work so well together:

Apache Storm is a distributed real-time computation engine that reliably processes unbounded streams of data. While Storm processes stream data at scale, Apache Kafka processes messages at scale. Kafka is a distributed pub-sub real-time messaging system that provides strong durability and fault tolerance guarantees.

Tutorial available here to get your hands dirty with Storm and Kafka.

Along similar lines, “Putting Apache Kafka To Use: A Practical Guide to Building a Stream Data Platform”  provides further reading material.

Finally, “Real Time Streaming with Apache Storm and Apache Kafka” offers the classic Twitter Stream Sentiment Analysis.

~ by mdavey on January 21, 2016.

One Response to “Real-time Data Refinery”

  1. […] on from an earlier posting on Data Refineries, I’m now beginning to consider connecting Kafka (my distributed commit-log) to H2O.  H2O […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: