Semantic Data Model
“When Hadoop Simply Isn’t Enough: How to Purpose-Build Architecture for Industrial Data” offer an interesting read, but not a mention of RDF’s. ElasticSearch and Hadoop are mentioned as part of the solution, but I’m not clear on how linkage of data is achieved. Am I missing something?
“The Data Lake Concept Is Maturing” however provide a more interesting read, with Apache Hadoop Distributed File System being call out for storage, coupled with:
when selecting a NoSQL database with which to work with their Hadoop clusters. MongoDB, he said, is typically used for department-level cache applications, Apache Cassandra for highly distributed interactive applications, and Apache Hbase for analytic applications which Bodkin said “can tolerate a bit more latency, having a smaller number of places where machine-learned models sit right next to your compute cluster in Hadoop.”
From a Graph database perspective, an interesting article on Neo4j and RDF’s, “Importing ttl (Turtle) ontologies in Neo4j”
Sempala research is interesting, but I don’t see the code anywhere.
Finally, to Spark and SPARQL, “RDF Graphs and GraphX“.
Which still leads to the question of what is the latest data lake software stack? Is the road HBase to hold the data and/or the RDF’s? Jena-HBase Or is HBase paired with a separate graph database offering SPARQL queries?