Get a Data Lake – ELT not ETL


Data Lakes are one of the buzzwords that has been going around for some time in the “big data” era.  Many companies/people has figures out what a data lake is, have create one, and are using it to great effect.  Others are still confused or unsure.

There are many articles and blog posts these days which provide clarity on data lakes.  Here’s one definition:

A Data Lake is a data store used for storing and processing large volumes of data. They are often used to collect raw data in native format before datasets are used for analytics purposes

Which leads to, in many ways, a pivotal line, “ELT not ETL” – thanks to James Serra’s posting.

ELT instead of ETL (loading the data into the data lake and then processing it). This can speed up transformations as the data lake is usually in a Hadoop cluster that can transform data much faster than an ETL tool

Which then leads to identification of all the data sources in your organisation, a deciding how best to extract and load the data from those sources – SpreadSheets, REST services, relational databases, etc.

~ by mdavey on April 24, 2016.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: