SPARQL Data Platform
Given the various postings on SPARQL recently, I thought it worth noting down the various data platform options I’ve considered:
- For pure PoC’ing, MySQL using the file import facility in MySQLWorkbench, running D2RQ to provide SPARQL access. Simple, and easy to setup.
- For more of a Hadoop platform, HBase with Apache Phoenix offering a JDBC driver, again allowing D2RQ to be used as the SPARQL access layer.
- Apache Marmotta, in many ways an improvement on Option 1 above, since it sits on top of standard database technology.
Option 1 is probably the quickest to move forwards with, once you’ve become annoyed with accessing corporate data that is spread across n systems, and your still in Machine Learning Discovery land 🙂 If you’ve used Apache Marmotta, or have time to set it up and learning the platform, Option 3 maybe a better bet.
Option 2 is probably the production version, or at least a stab in the right direction, as it offers improvements on scaling, coupled with Hadoopness 🙂
Where’s all this going? “SPARQL with R in less than 5 minutes” provide a quick and interesting read on the power of SPARQL. If your building a data lake without a foundation (ontology), you maybe missing a trick.
Interested in anyone else’s options