Spark SQL: Unified Data Access


Spark has gained in momentum over the years.   When married to H2O, it offer considerable power in the AI world of business transformation.  One nice feature of Spark is the ability to access various data sources in a consistent way – data frames/Spark SQL.  Amongst other data sources, JDBC and Hive are supported.  A fuller list of Spark Data Source packages is available here.  CSV files are even supported🙂

One data source that I thought would be supported, but doesn’t seem to be, is JIRA.  JSON data files are supported by Spark, but support to retrieve JSON from a REST endpoint (which is probably the way to go with JIRA) doesn’t seem to be supported out the box.

Update:  “Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structured Data” slide 16 has an example of loading JIRA data and storing it in Hive.

~ by mdavey on June 2, 2016.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: