Category: Technology

NEO4J: Union Sets

There are a lot of places where cypher queries don’t seem to have an “OR” type of functionality … but you can union two returned sets provided they have the same properties list
MATCH z=(:TOMATO {name: ‘Black Krim’})
RETURN z
UNION
MATCH z=(:TOMATO {name: ‘Cherokee Purple’})
RETURN z;

NEO4J: Debugging a Cypher Query

It is somewhat ironic that I continue to use print statements as my debugging tool of choice when programming but spent a decent bit of time trying to find a cypher query debugger. Just use a print statement — or, in this case, return.

When my query returned an error indicating that the variable isn’t defined even though I copy/pasted the variable name from whence I defined it:

I could just omit the component of the query with the error and try returning this variable

And performing operations on null values may not get me anywhere. Adding a replacement command to drop the commas produces integer values:

Using Git Submodules

Contents

Using Git Submodules

What Are Submodules?

Before You Start

Recursion

Adding Submodules

Performing Operations on All Submodules

Cloning a Repository That Contains Submodules

Making Changes to a Submodule

Pulling Changes Made to Submodules

Including Submodules in ADO Pipelines

Git Submodule Quick Reference Guide

What Are Submodules?

A submodule is a link – it is a pointer at a specific path in the parent repository’s directory structure that references a specific commit in another git repository. The actual git repository is no different than any other repository – what makes it a ‘submodule’ is someone using it inside a parent repository. If I create a really cool PHP function – an Excel parsing utility – I can publish it to a git repository to share it. This repository is not a submodule for me. Someone who wants to use my utility may add my repository as a submodule in their code repo. In their repo, my repository that houses the Excel parsing utility is a submodule. You cannot tell, by looking at a repository, if it is included elsewhere as a submodule.

The example provided above – one individual looking to include someone else’s code in their project – is the typical use case for submodules. But there are other scenarios where isolating different parts of code make sense. In our organization, we cannot deploy code until it passes a security scan. This means problems in one section of the code can hold up development on everything else. Additionally, our code scanning is charged per repository. While we may have a dozen independent tools that happen to reside in a parent folder, it is not cost effective to have dozens of repositories being scanned. Using submodules will allow us to roll up independent tools into a single scanned repository.

The parent repository does not actually track the changes in a submodule. Instead, the parent contains a pointer to a commit identifier in the submodule repository. All changes in the submodule are tracked in its repository.

Before You Start

Before working with submodules, ensure you have an up to date version of git. If, as you interact with remote repositories, you see an error indicating you are using an older version of git:

You are apt to see fatal errors like this:

Recursion

You can have submodules inside of submodules inside of submodules – because of this, most command that interact with submodules have a –recursive flag that will recursively apply the command to any submodules contained within submodules within the parent repository.

Adding Submodules

Begin with a git repository that will serve as the parent – in this example, I am starting with a blank repo that only contains a .gitignore and a readme.md file, but you can start with a repository that’s been under development for years.

We link in submodules using “git submodule add” followed by the repo URL. If you want the folder to have a different name, use “git submodule add URLHERE folderNameForSubmodule”

Once the Tool1 submodule has been added, files from Tool1 are on disk.

Check out a branch in the submodule, or use “git submodule foreach” to check each submodule out to the desired branch (main in this case)

If you use git diff with the –submodule switch, you can see that “Tool1” is being tracked as a submodule.

Add and commit the change – since this is both an entry in the .gitmodules file and the submodule reference to a commit hash, I am using “git add *”

Then push the changes to ADO

Looking in ADO, you will see each submodule folder is represented as a file. The file contents is the hash of the commit to which the submodule points.

Because the “folder” is reflected as a file in the repository, it will not sort with the other folders. You’ll need to look in the file listing to find the submodule file.

Performing Operations on All Submodules

The “git submodule foreach” command allows you to run commands on each submodule in a repository.

You can run any git command in this fromeach loop – as an example, I will checkout a specific branch in all submodules using

git submodule foreach –recursive git checkout main

Cloning a Repository That Contains Submodules

Cloning with –recurse-submodules flag

When cloning a repository with submodules, add –recurse-submodules to your git clone command to pull all of the submodules. To clone my ParentRepo, I use:

git clone –recurse-submodules https://Windstream@dev.azure.com/e0082643/e0082643/_git/ParentRepo

You will see that git checks out the parent project and all included submodules.

Use “git submodule foreach –recursive git checkout main” to check each submodule out to the desired branch (in this case, main).

Cloning without –recurse-submodules

If you forget to include –recurse-submodules in your clone command, or if the submodules have been added to a repository that you have already cloned, you will have empty folders instead of each submodule’s files. A parent repository only has pointers to commits for submodules – until the submodules are initialized, those pointers go nowhere.

Use git submodule update –init –recursive to pull files from the registered submodules.

Making Changes to a Submodule

Working within a submodule’s folder will perform operations in the submodule’s git repository. Working in other folders will perform operations in the parent’s git repository.

To make a change to Tool3, I first need to change into the Tool3 directory.

Make whatever changes are needed. Then add the changes, create a commit, and push the commit to the submodule’s repository. You can create branches, merge branches, and such all within the submodule’s folder to interact with the submodule’s git repository.

If you look at the parent repository, you will see that no changes have been made to it – committing changes to a submodule repository will not kick off the pipeline code scan.

To update the parent repository to point the submodule to your most recent commit, you will need to commit the submodule folder in the parent folder. Change directory into the parent repo – add, commit, and push the changes.

And push the references to the remote using “git submodule update –recursive –remote”

This commit updates the contents of the file representing the repo’s folder – the folder’s file used to reference a commit ending in 6d95 and now it references a commit ending in 6292

Which you can see in the git log of the submodule repository:

Because we have made changes to the parent repository, the pipeline that initiates our code scanning should execute.

Pulling Changes Made to Submodules

To sync changes made by others (or that you’ve made in other locations), you will need to pull the submodules, you need to pull the submodule as well as the parent repo.

To pull (fetch and merge) changes from all upstream submodules, use:

git submodule update –recursive –remote –merge

Using “git submodule status” to view the submodules, you can see the submodule now points to the commit hash from the change we made

Including Submodules in ADO Pipelines

Veracode scanning is initiated through a pipeline. Since our goal is to include files from all of the submodules when scanning the parent repository, we need to ensure those files get bundled into the zip file that is submitted for scanning.

To do so, we need to add a checkout step to the pipeline and set “submodules” to ‘true’

When the submodules are all part of the same ADO project, you do not need to supply additional credentials in the pipeline. Where a different set of credentials are required, you can check out the submodule by passing an extra header with an authorization token.

Git Submodule Quick Reference Guide

Clone repo and all submodules:

git clone –recurse-submodules REPO_URL

cd RepoFolder

git submodule update –init –recursive

git submodule foreach –recursive git checkout main

Add a submodule to a repo:

git submodule add REPO_URL /path/to/folderForSubmodule

git submodule update –init –recursive

git submodule foreach –recursive git checkout main

git add .

git commit -m “Adding submodules”

git push

git submodule update –recursive –remote

Check Out a Branch in All Submodules

git submodule foreach –recursive git checkout main

Committing Change to a Submodule

cd .\submodule_folder

# Make some changes!

git add .

git commit –author=”Lisa Rushworth <Lisa.Rushworth@windstream.com>” -m “Commit Message”

git push

cd ..

git add submodule_folder

git commit -m “Updated submodule”

git push

Pull Updates into All Submodules:

git submodule update –recursive –remote –merge

 

NEO4J: Installing Plugins

Since I have the /plugins directory persisted, I can use wget <URL> to grab a JAR file and place it into the plugins folder. Restart the container and the plugin is active.

Example installing APOC extended plugin:

cd /docker/neo4j/plugins;wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/5.11.0/apoc-5.11.0-extended.jar;docker stop neo4j;docker start neo4j

NEO4J: APOC Plugin in Docker Container

I wanted to test some of the Awesome Procedures on Cypher (APOC) functions, but first I needed to install the plugins. I needed to modify my docker run statement slightly to persist the /plugins directory and install APOC core:

 

docker run -dit -p 7474:7474 -p 7687:7687 --volume=/docker/neo4j/data:/data --volume=/docker/neo4j/conf:/var/lib/neo4j/conf --volume=/docker/neo4j/plugins:/plugins -e NEO4J_AUTH=none -e NEO4J_apoc_export_file_enabled=true -e NEO4J_apoc_import_file_enabled=true -e NEO4J_apoc_import_file_use__neo4j__config=true -e NEO4J_PLUGINS=\[\"apoc\"\] --name neo4j neo4j

Maintaining an /etc/hosts record

I encountered an oddity at work — there’s a server on an internally located public IP space. Because it’s public space, it is not allowed to communicate with the internal interface of some of our security group’s servers. It has to use their public interface (not technically, just a policy on which they will not budge). I cannot just use a DNS server that resolves the public copy of our zone because then we’d lose access to everything else, so we are stuck making an /etc/hosts entry. Except this thing changes IPs fairly regularly (hey, we’re moving from AWS to Azure; hey, let’s try CloudFlare; nope, that is expensive so change it back) and the service it provides is application authentication so not something you want randomly falling over every couple of months.

So I’ve come up with a quick script to maintain the /etc/hosts record for the endpoint.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# requires: dnspython, subprocess
 
import dns.resolver
import subprocess
 
strHostToCheck = 'hostname.example.com' # PingID endpoint for authentication
strDNSServer = "8.8.8.8"         # Google's public DNS server
listStrIPs = []
 
# Get current assignement from hosts file
listCurrentAssignment = [ line for line in open('/etc/hosts') if strHostToCheck in line]
 
if len(listCurrentAssignment) >= 1:
        strCurrentAssignment = listCurrentAssignment[0].split("\t")[0]
 
        # Get actual assignment from DNS
        objResolver = dns.resolver.Resolver()
        objResolver.nameservers = [strDNSServer]
        objHostResolution = objResolver.query(strHostToCheck)
 
        for objARecord in objHostResolution:
                listStrIPs.append(objARecord.to_text())
 
        if len(listStrIPs) >= 1:
                # Fix /etc/hosts if the assignment there doesn't match DNS
                if strCurrentAssignment in listStrIPs:
                        print(f"Nothing to do -- hosts file record {strCurrentAssignment} is in {listStrIPs}")
                else:
                        print(f"I do not find {strCurrentAssignment} here, so now fix it!")
                        subprocess.call([f"sed -i -e 's/{strCurrentAssignment}\t{strHostToCheck}/{listStrIPs[0]}\t{strHostToCheck}/g' /etc/hosts"], shell=True)
        else:
                print("No resolution from DNS ... that's not great")
else:
        print("No assignment found in /etc/hosts ... that's not great either")

NEO4J: Adding Indices

Like SQL-based databases, you can create indices in Neo4j to optimize frequently performed searches. In my particular case, I am usually searching by the plant name. Adding an index on the plant’s name, therefore, makes sense:

CREATE INDEX index_tomato_name IF NOT EXISTS
FOR (n:TOMATO)
ON (n.name)

I am also interested in mapping the genetic lineage, so I’ve added indices for the male and female parent plants:

CREATE INDEX index_tomato_parentm IF NOT EXISTS
for ()-[r:PARENT_M]-()
on (r.name)

CREATE INDEX index_tomato_parentf IF NOT EXISTS
for ()-[r:PARENT_F]-()
on (r.name)

 

Hypothetically, you can use the procedure CALL db.indexes(); to view all of the indices in a database, but SHOW PROCEDURES; shows me that procedure isn’t registered in the community edition.

╒═════════════════════════════════════════════════════════╤══════════════════════════════════════════════════════════════════════╤═══════╤═════════════╕
│name                                                     │description                                                           │mode   │worksOnSystem│
╞═════════════════════════════════════════════════════════╪══════════════════════════════════════════════════════════════════════╪═══════╪═════════════╡
│"db.awaitIndex"                                          │"Wait for an index to come online (for example: CALL db.awaitIndex(\"M│"READ" │true         │
│                                                         │yIndex\", 300))."                                                     │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.awaitIndexes"                                        │"Wait for all indexes to come online (for example: CALL db.awaitIndexe│"READ" │true         │
│                                                         │s(300))."                                                             │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.clearQueryCaches"                                    │"Clears all query caches."                                            │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.createLabel"                                         │"Create a label"                                                      │"WRITE"│false        │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.createProperty"                                      │"Create a Property"                                                   │"WRITE"│false        │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.createRelationshipType"                              │"Create a RelationshipType"                                           │"WRITE"│false        │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.index.fulltext.awaitEventuallyConsistentIndexRefresh"│"Wait for the updates from recently committed transactions to be appli│"READ" │true         │
│                                                         │ed to any eventually-consistent full-text indexes."                   │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.index.fulltext.listAvailableAnalyzers"               │"List the available analyzers that the full-text indexes can be config│"READ" │true         │
│                                                         │ured with."                                                           │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.index.fulltext.queryNodes"                           │"Query the given full-text index. Returns the matching nodes, and thei│"READ" │true         │
│                                                         │r Lucene query score, ordered by score. Valid keys for the options map│       │             │
│                                                         │ are: 'skip' to skip the top N results; 'limit' to limit the number of│       │             │
│                                                         │ results returned; 'analyzer' to use the specified analyzer as search │       │             │
│                                                         │analyzer for this query."                                             │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.index.fulltext.queryRelationships"                   │"Query the given full-text index. Returns the matching relationships, │"READ" │true         │
│                                                         │and their Lucene query score, ordered by score. Valid keys for the opt│       │             │
│                                                         │ions map are: 'skip' to skip the top N results; 'limit' to limit the n│       │             │
│                                                         │umber of results returned; 'analyzer' to use the specified analyzer as│       │             │
│                                                         │ search analyzer for this query."                                     │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.info"                                                │"Provides information regarding the database."                        │"READ" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.labels"                                              │"List all available labels in the database."                          │"READ" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.ping"                                                │"This procedure can be used by client side tooling to test whether the│"READ" │true         │
│                                                         │y are correctly connected to a database. The procedure is available in│       │             │
│                                                         │ all databases and always returns true. A faulty connection can be det│       │             │
│                                                         │ected by not being able to call this procedure."                      │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.prepareForReplanning"                                │"Triggers an index resample and waits for it to complete, and after th│"READ" │true         │
│                                                         │at clears query caches. After this procedure has finished queries will│       │             │
│                                                         │ be planned using the latest database statistics."                    │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.propertyKeys"                                        │"List all property keys in the database."                             │"READ" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.relationshipTypes"                                   │"List all available relationship types in the database."              │"READ" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.resampleIndex"                                       │"Schedule resampling of an index (for example: CALL db.resampleIndex(\│"READ" │true         │
│                                                         │"MyIndex\"))."                                                        │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.resampleOutdatedIndexes"                             │"Schedule resampling of all outdated indexes."                        │"READ" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.schema.nodeTypeProperties"                           │"Show the derived property schema of the nodes in tabular form."      │"READ" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.schema.relTypeProperties"                            │"Show the derived property schema of the relationships in tabular form│"READ" │true         │
│                                                         │."                                                                    │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.schema.visualization"                                │"Visualizes the schema of the data based on available statistics. A ne│"READ" │true         │
│                                                         │w node is returned for each label. The properties represented on the n│       │             │
│                                                         │ode include: `name` (label name), `indexes` (list of indexes), and `co│       │             │
│                                                         │nstraints` (list of constraints). A relationship of a given type is re│       │             │
│                                                         │turned for all possible combinations of start and end nodes. The prope│       │             │
│                                                         │rties represented on the relationship include: `name` (type name). Not│       │             │
│                                                         │e that this may include additional relationships that do not exist in │       │             │
│                                                         │the data due to the information available in the count store. "       │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.stats.clear"                                         │"Clear collected data of a given data section. Valid sections are 'QUE│"READ" │true         │
│                                                         │RIES'"                                                                │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.stats.collect"                                       │"Start data collection of a given data section. Valid sections are 'QU│"READ" │true         │
│                                                         │ERIES'"                                                               │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.stats.retrieve"                                      │"Retrieve statistical data about the current database. Valid sections │"READ" │true         │
│                                                         │are 'GRAPH COUNTS', 'TOKENS', 'QUERIES', 'META'"                      │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.stats.retrieveAllAnonymized"                         │"Retrieve all available statistical data about the current database, i│"READ" │true         │
│                                                         │n an anonymized form."                                                │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.stats.status"                                        │"Retrieve the status of all available collector daemons, for this data│"READ" │true         │
│                                                         │base."                                                                │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"db.stats.stop"                                          │"Stop data collection of a given data section. Valid sections are 'QUE│"READ" │true         │
│                                                         │RIES'"                                                                │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.cluster.routing.getRoutingTable"                   │"Returns the advertised bolt capable endpoints for a given database, d│"DBMS" │true         │
│                                                         │ivided by each endpoint's capabilities. For example an endpoint may se│       │             │
│                                                         │rve read queries, write queries and/or future getRoutingTable requests│       │             │
│                                                         │."                                                                    │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.components"                                        │"List DBMS components and their versions."                            │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.info"                                              │"Provides information regarding the DBMS."                            │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.killConnection"                                    │"Kill network connection with the given connection id."               │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.killConnections"                                   │"Kill all network connections with the given connection ids."         │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.listCapabilities"                                  │"List capabilities"                                                   │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.listConfig"                                        │"List the currently active config of Neo4j."                          │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.listConnections"                                   │"List all accepted network connections at this instance that are visib│"DBMS" │true         │
│                                                         │le to the user."                                                      │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.queryJmx"                                          │"Query JMX management data by domain and name. For instance, \"*:*\"" │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.routing.getRoutingTable"                           │"Returns the advertised bolt capable endpoints for a given database, d│"DBMS" │true         │
│                                                         │ivided by each endpoint's capabilities. For example an endpoint may se│       │             │
│                                                         │rve read queries, write queries and/or future getRoutingTable requests│       │             │
│                                                         │."                                                                    │       │             │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.upgrade"                                           │"Upgrade the system database schema if it is not the current schema." │"WRITE"│true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"dbms.upgradeStatus"                                     │"Report the current status of the system database sub-graph schema."  │"READ" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"tx.getMetaData"                                         │"Provides attached transaction metadata."                             │"DBMS" │true         │
├─────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────────────────────┼───────┼─────────────┤
│"tx.setMetaData"                                         │"Attaches a map of data to the transaction. The data will be printed w│"DBMS" │false        │
│                                                         │hen listing queries, and inserted into the query log."                │       │             │
└─────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────┴

NEO4J: Searching

Returning data from NEO4J is matching — we can match a node using MATCH p=(a:TOMATO {name: 'Black Krim'}) return p;

where the object “p” becomes a set of nodes labeled with ‘TOMATO’ where the value of ‘name’ is ‘Black Krim’

More advanced matches set the return object to a set of nodes and relationships — here we set p to the set of items starting at the node labeled TOMATO with name ‘PLANT0’ with relationships and nodes until you get to a node labeled TOMATO with name ‘Tomato0000007’

MATCH p=(a:TOMATO {name: 'PLANT0'})-[*]->(b:TOMATO {name: 'Tomato0000007'}) RETURN p

NEO4J: Relational Databases

I always found it odd that relational databases didn’t really have relationships as part of the stored data. Technically, they could if you had a view or stored procedure with a bunch of JOIN’s in there. But relational databases stored data about which you could build relationships. And people using the data may not even know about relationships that other people saw within that data. Some of our databases at work have amazing documentation — a hundred meg of PDF files detailing the relationships that the database creator wanted people to use. Other databases? No such luck!

To me, that’s the big advantage of a Graph database — the relationships are stored within the data, so anyone viewing it for the first time doesn’t need to poke around, see where values could be correlated across tables, and create their own JOIN statements to build out the relationship. Recording data in the database means defining those relationships. It would still be good to document a graph model (we have ‘people’ nodes who ‘act in’, ‘cast’, ‘produce’, or ‘direct’ ‘movie’ nodes), but you could figure all of those things out by perusing the database content.

NEO4J: Applications

I remember a friend of mine who taught introductory computer programming classes in University. One test question he always used was essentially ‘do something that people can do easily’. The exact details would change year to year, but the important thing is that the instructions simply stated “Sort the following list in alphabetical order”. Not write a program that sorts an arbitrary list into alphabetical order — sort this list. And, sure, you could write a program to do it. A program that would successfully accomplish the goal got an A grade. But so did someone who just took the list, sorted it alphabetically in their head, and wrote the list in alphabetical order. That answer? Would get extra credit. Because knowing when not to program is really important too. From a business standpoint, it’s a waste of money for someone to write a program to perform some easy one-off task that is never going to be done again.

I think about that a lot when I see “new” technologies getting picked up. I call it the “CIO magazine” approach to technology adoption. You see some new thing, have a very basic understanding of how it works, and decide we need to use one of these. And fail to consider if your use case is reasonable or if you’re willing to do the extra work for the “new” thing. The biggest example I experience of “not reasonable” is the prevalence of Java programming. If I am selling software, cross-platform compilation is A Very Good Thing. If I can maintain a single pipeline and a single release that all of my customers can use? Score! Internally developed software, though? We requisition specific platforms. Our Linux servers are not going to be Windows servers next Tuesday. I can write code, compile it for Linux, and be fine even if it doesn’t run under Windows. Because it doesn’t need to run under Windows. The extra work — our internal support groups adopted Agile. Except no one was willing to prioritize the ticket queue — so there’s no prioritized list of work to select from. Everyone has two “issues” for their sprint — “incident support” and “admin time”. A few people might get involved in a specific project and add “dns decom” or “nextgen vpn testing”. But sprint planning is repetitive (I’ve got incident support & admin time, too!), daily standups were a joke (I did tickets yesterday & had a meeting), and techs still didn’t know what ticket was the priority that should be pulled next. Which doesn’t make Java or Agile a bad idea — it just makes them overly complicated for the situation in which they are being used.

I think of all of this when I see Graph databases being implemented — step #0 is does a graph database make sense for my data? What would that mean? It would mean that the data elements are interconnected somehow — if you’d be using a lot of JOIN queries to interact with the stored data, then a graph model might make sense. If you’re just selecting items from individual tables where values are whatever? Then a graph database is a complex way of storing and accessing that data.