Spark SQL: Unified Data Access
Spark has gained in momentum over the years. When married to H2O, it offer considerable power in the AI world of business transformation. One nice feature of Spark is the ability to access various data sources in a consistent way – data frames/Spark SQL. Amongst other data sources, JDBC and Hive are supported. A fuller list of Spark Data Source packages is available here. CSV files are even supported🙂
One data source that I thought would be supported, but doesn’t seem to be, is JIRA. JSON data files are supported by Spark, but support to retrieve JSON from a REST endpoint (which is probably the way to go with JIRA) doesn’t seem to be supported out the box.
Update: “Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structured Data” slide 16 has an example of loading JIRA data and storing it in Hive.