Feeding a Data Refinery with Kafka

Following on from an earlier posting on Data Refineries, I’m now beginning to consider connecting Kafka (my distributed commit-log) to H2O.  H2O looks interested based on using R-Studio for a short while, couple with the availability of analytics, and the general uptake of the platform – am I wrong?

I can’t find a lot using Google around connecting Kafka to H2O.  Hence I’m wondering if I should look to the road of connecting Kafka to Spark, and thus use Sparking Water?  Spark offers a few options for connecting to Kafka:

  1. Receiver-based Approach
  2. Direct Approach (No Receivers)

The No Receivers approach appears to be the preferred road, with the benefit of zero data loss – always a good thing 🙂

Taking this a step further, I wonder what predictions are achievable if I took the skill cloud, married with other interesting “data” as per the Cloud Skill postings.



~ by mdavey on March 11, 2016.

3 Responses to “Feeding a Data Refinery with Kafka”

  1. Kafka are a good fit from my experience, got inspiration from this pydata nyc 2015 talk https://www.youtube.com/watch?v=5XB-T4hzV00

  2. Spark and Kafka are a …

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: