Neo4j Node Data Storage

Neo4j (graph database) is cool piece of technology.  Graph databases,  need to be used appropriately, they like blockchain, and other technologies in the sweet store, are not silver bullets in their own right.

When considering what to store in a graph database, its easy to leverage the nodes and linkage between nodes, relationships, to generate a graph, and traverse a graph.  What is unclear, or not well documented, is how much data you should store on nodes. For example,  Christophe Willemsen posting on GraphAware with regards to using Neo4j to model blogs with postings that are followed by users has clear graph benefits.  What isn’t clear is if the postings themselves which could be lengthy in terms of text, and may include images at a minimum, should also be stored on the post node as a property/label?

GitHub neo4j-contrib/neo4j-faq offer the following, which maybe relevant:

Neo4j is currently not suitable for storing BLOBs/CLOBs. Nodes, relationships, and properties are not co-located on disk. This might be introduced in the future.

The 1st edition (unfortunately I don’t have the latest version) of Graph Databases by Ian Robinson, Jim Webber & Emil Eifrem provides the following:

The node store file stores node records

Like most of the Neo4j store files, the node store is a fixed-size record store, where each record is 9 bytes in length

Correspondingly, relationships are stored in the relationship store file.

The relationship store consists of fixed-size records—in this case each record is 33 bytes long.

In addition to the node and relationship stores, which contain the graph structure, we have the property store files. These store the user’s key-value pairs. Recall that Neo4j, being a property graph database, allows properties—name-value pairs—to be attached to both nodes and relationships. The property stores, therefore, are referenced from both node and relationship records.

Neo4j supports store optimizations, whereby it inlines some properties into the property store file directly. This happens when property data can be encoded to fit in one or more of a record’s four property blocks. In practice this means that data like phone numbers and zip codes can be inlined in the property store file directly, rather than being pushed out to the dynamic stores.

Back to the Christophe Willemsen article, should the postings be stored in Neo4j, or outside with a relevant ID to allow Neo4j linkage?


~ by mdavey on March 14, 2016.

One Response to “Neo4j Node Data Storage”

  1. Having many properties on a node or storing large strings will definitely impact performance. I would store any information not directly relevant to the nodes and/or relationships externally and reference via an id

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: