Feeding a Data Refinery with Kafka
Following on from an earlier posting on Data Refineries, I’m now beginning to consider connecting Kafka (my distributed commit-log) to H2O. H2O looks interested based on using R-Studio for a short while, couple with the availability of analytics, and the general uptake of the platform – am I wrong?
I can’t find a lot using Google around connecting Kafka to H2O. Hence I’m wondering if I should look to the road of connecting Kafka to Spark, and thus use Sparking Water? Spark offers a few options for connecting to Kafka:
- Receiver-based Approach
- Direct Approach (No Receivers)
The No Receivers approach appears to be the preferred road, with the benefit of zero data loss – always a good thing🙂
Taking this a step further, I wonder what predictions are achievable if I took the skill cloud, married with other interesting “data” as per the Cloud Skill postings.