Machine Learning: Spark vs Flink

Interesting slide deck from Capital One comparing Flink vs Spark.  H2O has Spark integration though SparkingWater which is all well and good, but Flink look more interesting 🙂

Having not done much work with TensorFlow, I see it has its own cluster for distribution.  However, databricks has integrated it with Spark.

Deeplearning4j provide a interesting comparison of a number of ML libraries, with mention of Spark, but no mention of Flink.

There is some benchmarking of Spark and Flink here, with in many ways expected outcomes:

Apache Flink outperforms Apache Spark in processing machine learning & graph algorithms and relational queries but not in batch processing!

Some searching, and we get to Full Metal Data Lake.  Interesting, but not quite what I’m looking for.  However, it does point me at Apache Drill.

“Data Lake Architecture Considerations & Composition” provides direction on a Data Lake architecture being composed of three layers and three tiers.  Extremely helpful, and one of the better articles I’ve found on data lakes.  Probably also worth a read is “2nd Version of Data Lake vs. Data Warehouse”

Back to Flink, Zalando’s next generation data integration and distribution platform Saiki provides some thoughts on architecture.  Nice to see Saiki’s unified log uses Apache Kafka to feed the data lake – great choice 🙂

~ by mdavey on April 26, 2016.

One Response to “Machine Learning: Spark vs Flink”

  1. […] Source: Machine Learning: Spark vs Flink […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: