Semantic Data Model

“When Hadoop Simply Isn’t Enough: How to Purpose-Build Architecture for Industrial Data” offer an interesting read, but not a mention of RDF’s.  ElasticSearch and Hadoop are mentioned as part of the solution, but I’m not clear on how linkage of data is achieved.  Am I missing something?

“The Data Lake Concept Is Maturing” however provide a more interesting read, with Apache Hadoop Distributed File System being call out for storage, coupled with:

when selecting a NoSQL database with which to work with their Hadoop clusters. MongoDB, he said, is typically used for department-level cache applications, Apache Cassandra for highly distributed interactive applications, and Apache Hbase for analytic applications which Bodkin said “can tolerate a bit more latency, having a smaller number of places where machine-learned models sit right next to your compute cluster in Hadoop.”

From a Graph database perspective, an interesting article on Neo4j and RDF’s, “Importing ttl (Turtle) ontologies in Neo4j”

Sempala research is interesting, but I don’t see the code anywhere.

Finally, to Spark and SPARQL, “RDF Graphs and GraphX“.

Which still leads to the question of what is the latest data lake software stack?  Is the road HBase to hold the data and/or the RDF’s? Jena-HBase  Or is HBase paired with a separate  graph database offering SPARQL queries?

~ by mdavey on May 9, 2016.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: