Get a Data Lake – ELT not ETL

Data Lakes are one of the buzzwords that has been going around for some time in the “big data” era.  Many companies/people has figures out what a data lake is, have create one, and are using it to great effect.  Others are still confused or unsure.

There are many articles and blog posts these days which provide clarity on data lakes.  Here’s one definition:

A Data Lake is a data store used for storing and processing large volumes of data. They are often used to collect raw data in native format before datasets are used for analytics purposes

Which leads to, in many ways, a pivotal line, “ELT not ETL” – thanks to James Serra’s posting.

ELT instead of ETL (loading the data into the data lake and then processing it). This can speed up transformations as the data lake is usually in a Hadoop cluster that can transform data much faster than an ETL tool

Which then leads to identification of all the data sources in your organisation, a deciding how best to extract and load the data from those sources – SpreadSheets, REST services, relational databases, etc.

~ by mdavey on April 24, 2016.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: