Category: Technology

Using Graph Database to Track Plant Hybridization

Graph databases are designed to both store data and record relationships between data elements. I wondered if this would be useful in tracking cross-breeding projects – essentially building “family trees” of the entity being cross-bred. The data model would have nodes with the hybrid with notes on it. Relationships for PARENT_M and PARENT_F (male and female parent of the hybrid) would be used to associate nodes.

Graph databases have a concept of pathing – what nodes do we need to traverse to get from A to B – but to create a lineage for the plants, you need to know the starting point. Which is great if you want to play six degrees of separation and find a path between two known people, but not great if I just want to know what the lineage is of Tomato #198. To make pathing possible, I needed to add a common root node to all of heirloom seedstocks – PLANT0

This allows me to take any plant and find the paths from PLANT0 to it

MATCH
p=(Tomato0:TOMATO {name: 'PLANT0'})-[*]->(Tomato8:TOMATO{name: 'Tomato0000008'})
RETURN p

And visualize the genetic heritage of the hybrid.

 

Neo4J — Setting Up and Basic Record Management

Setting up a Neo4J Database

Ostensibly, you can create a new database using “create database somethingorother”. However, that is if you are using the enterprise edition. Running the community edition, you can only run one database. Attempting to create a new database will produce an error indicating Neo.ClientError.Statement.UnsupportedAdministrationCommand

To use a database with a custom name, I need to edit neo4j.conf and set initial.dbms.default_database

Then create a Docker container – I am mapping /data to an external directory to persist my data and /var/lib/neo4j/conf to an external directory to persist configuration

docker run -dit --name neo4j --publish=7474:7474 --publish=7687:7687 --env=NEO4J_AUTH=none --volume=/docker/neo4j/data:/data --volume=/docker/neo4j/conf:/var/lib/neo4j/conf neo4j

Listing the databases using “show databases” will show my custom database name

Switch to our database with the “:use” instruction

Create single nodes

CREATE (:PLANTS {name: ‘Black Krim’, year: 1856, color: ‘deep red’, flavor: ‘sweet’, notes: ‘Heirloom seedstock’})

Note: After I started using my data, I realized that “PLANTS” is a silly label to use since they will all be plants. I recreated all of my data with nodes labeled “TOMATO” so I can also track peppers, daffodils, and any other plants we start hybridizing.

Load of all records:

CREATE(:TOMATO {flavor: "air",notes: "hypothetical",color: "invisible",year: "1",name: "PLANT0"});
CREATE(:TOMATO {flavor: "acidic",notes: "Heirloom seedstock",color: "purple",year: 1890,name: "Cherokee Purple"});
CREATE(:TOMATO {flavor: "sweet",notes: "Heirloom seedstock",color: "bright red",name: "Whittemore"});
CREATE(:TOMATO {flavor: "sweet",notes: "Heirloom seedstock",color: "deep red",year: 1856,name: "Black Krim"});
CREATE(:TOMATO {name: 'Kellogg', color: 'bright red', flavor: 'sweet', year: '1900', notes: 'beautiful and tasty'});
CREATE(:TOMATO {name: 'Brandywine', color: 'bright red', flavor: 'sweet', year: '1900', notes: 'very tasty'});
CREATE(:TOMATO {name: 'Japanese Trifele Black', color: 'dark purple red', flavor: 'sweet', year: '1900', notes: 'nice acidic flavor'});
CREATE(:TOMATO {name: 'Sweet Apertif', color: 'bright red', flavor: 'sweet', year: '1900', notes: 'cherry'});
CREATE(:TOMATO {name: 'Eva Purple', color: 'dark purple red', flavor: 'sweet', year: '1900', notes: 'did not grow well'});
CREATE(:TOMATO {name: 'Mortgage Lifter', color: 'bright red', flavor: 'sweet', year: '1900', notes: 'huge but lacking flavor and lots of bad spots'});
CREATE(:TOMATO {flavor: "sweet",notes: "",color: "deep red",year: "2021",name: "Tomato0000001"});
CREATE(:TOMATO {flavor: "bland",notes: "small tomatoes with little flavor",color: "red",year: "2021",name: "Tomato0000002"});
CREATE(:TOMATO {flavor: "watery",notes: "not much acid",color: "pinkish",year: "2022",name: "Tomato0000003"});
CREATE(:TOMATO {flavor: "sweet",notes: "sweet, slightly acidic",color: "red",year: "2022",name: "Tomato0000004"}) ;
CREATE (:TOMATO {name: 'Tomato0000005', color: 'bright red', flavor: 'sweet', year: '2022', notes: 'amazing'});
CREATE (:TOMATO {name: 'Tomato0000006', color: 'bright red', flavor: 'sweet', year: '2023', notes: 'beautiful and tasty but no bigger than parent'});
CREATE (:TOMATO {name: 'Tomato0000007', color: 'bright red', flavor: 'sweet', year: '2023', notes: 'beautiful and tasty but no bigger than parent'});
CREATE (:TOMATO {name: 'Tomato0000008', color: 'bright red', flavor: 'sweet', year: '2023', notes: 'beautiful and tasty, slightly larger than parent'});

Show records with MATCH

The search starts with the verb “MATCH”. In parenthesis, we add the matching rule. This begins with an object name variable – you can have anonymous nodes (no variable names assigned) by omitting this string and just typing the colon. This is followed by the label that we want to match – basically the type of node we are looking for. Then, in curly braces, a filter – in this case, I am looking for nodes where the “name” field has the value “Black Krim”. Finally, there’s a return statement that indicates that we want to output the matched results.

You can include relationships in the query – parenthesis around nodes and square brackets around relationships.

(placeholdername:nodes)-[:RELATIONSHIP_CONNECTION_TYPE]->(anotherplaceholdername:otherNodes)

This is what makes graph databases interesting for tracking hybridization – we can easily produce the lineage of the plants we develop.

Deleting a record

Deleting a record is used in conjunction with match — use the DELETE verb on the collections of objects returned into your variable name. Here, the variable is ‘x’:

MATCH (x:PLANTS{name: 'Black Krim'})
DELETE x

Deleting a record and relationships by ID

When deleting a record, you can include relationship matches:

MATCH (p:PLANTS) where ID(p)=1
OPTIONAL MATCH (p)-[r]-()
DELETE r,p

Create a relationship

To create a relationship, we first need to match two objects – here I am finding a plant named PLANT0 and all of the “heirloom seedstock” plants to which I assigned year 1900 – and create parent/child relationships. Since there is both a male and female parent, that is included in the relationship name:

MATCH (a:TOMATO), (b:TOMATO)
WHERE a.name = 'PLANT0' AND b.year = '1900'
CREATE (a)-[r:PARENT_M]->(b)

MATCH (a:TOMATO), (b:TOMATO)
WHERE a.name = 'PLANT0' AND b.year = '1900'
CREATE (a)-[r:PARENT_F]->(b)

Create records with parent/child relationships

You can create records and relationships in a single command, too:

CREATE p = (:PLANTS {name: 'Black Krim', year: 1856, color: 'deep red', flavor: 'sweet', notes: 'Heirloom seedstock'})-[:PARENT_M]->(:PLANTS {name: 'Tomato0000001', color: 'deep red', flavor: 'sweet', year: '2023', notes: ''})<-[:PARENT_F]-(:PLANTS {name: 'Cherokee Purple', color: 'purple', flavor: 'acidic', year: 1890, notes: 'Heirloom seedstock'})

RETURN p

Viewing Records with Relationships

When you match records, you will also get their relationships:

Bulk importing data

LOAD CSV WITH HEADERS FROM ‘https://www.rushworth.us/lisa/ljr_plant_history.csv’

 

NEO4J: Basics

When I first encountered the idea of labeled property graph data storage, I thought about the Dirk Gently Holistic Detective Agency series — where a belief in the fundamental interconnectedness of things plays a central role to the plot. Traditional SQL storage stored information and relationships were defined by cross-referencing a field (or more) between tables. The creators of graph databases sought to store the relationships along with data elements.

A labeled property graph database store nodes – the circles below – and define relationships – the lines – between nodes. Relationships are a way to show the interconnectedness of all things — people who purchased items, individuals who acted or directed films. I am looking at graph databases to maintain plant hybridization records — relationships build the family tree.

Nodes can have a label – essentially a classification, here I have “Candidate” nodes and “Election” nodes. A node can have multiple labels — maybe a political party affiliation as well as candidate. Nodes can include relationships to themselves (I’ve met me?)

Relationships have a type – WON or LOST here. Relationships are also directional – an election didn’t win a dude, a dude won an election.

Both nodes and relationships can have properties – additional information about the entity. In this example, each election node stores a year in which the election took place along with the vote totals both popular vote and electoral college results.

 

Tableau Query — Data Sources and the Workbooks Where They Are Used

I have found Tableau’s views of data sources to be … lacking. To provide a report of data sources, the database type, and where it is being used, I put together a query that locates all data sources (or filters by database type — specifically, I was trying to see who was using Snowflake) and lists the site, project, and workbook using the data source.

-- Data sources and what workbook they are used in
SELECT system_users.email , datasources.id, datasources.name, datasources.created_at,  datasources.updated_at, datasources.db_class, datasources.db_name
, datasources.site_id, sites.name AS SiteName, projects.name AS ProjectName, workbooks.name AS WorkbookName
FROM datasources 
LEFT OUTER JOIN users ON users.id = datasources.owner_id
LEFT OUTER JOIN system_users ON users.system_user_id = system_users.id
LEFT OUTER JOIN sites ON datasources.site_id = sites.id
LEFT OUTER JOIN projects ON datasources.project_id = projects.id
LEFT OUTER JOIN workbooks ON datasources.parent_workbook_id = workbooks.id
-- WHERE datasources.db_class = 'snowflake' 
ORDER BY datasources.created_at;


Postgresql SPLIT_PART and TRANSLATE

We have a database where there’s a single field, args, into which the vendor has glommed quite a few different things. Unfortunately, I need one of those numbers.

"---
- Workbook
- 4477
- Sample Report
- 18116
- null
"

You can use split_part to break a column into elements and only use one of those elements split_part(column_to_split, delimiter, ColumnToKeep)

As an example:
SPLIT_PART(b.args, E'\n', 3)AS task_workbook_id

In this case, I subsequently needed to eliminate the dash and space that prefixed the line. Using TRANSLATE, I am removing the ‘- ‘ with ”:
TRANSLATE ( SPLIT_PART(b.args, E'\n', 3), '- ','') AS task_workbook_id

And now I’ve just got 4477

Web Redirection Based on Typed URL

I have no idea why I am so pleased with this simple HTML code, but I am! My current project is to move all of our Tableau servers to different servers running a newer version of Windows. When I first got involved with the project, it seemed rather difficult (there was talk of manually recreating all of the permissions on each item!!) … but, some review of the vendors documentation let me to believe one could build a same-version server elsewhere (newer Windows, out in magic cloudy land, but the same Tableau version), back up the data from the old server, restore it to the new one, and be done. It’s not quite that simple — I had to clear out the SAML config & manually reconfigure it so the right elements get added into the Java keystore, access to the local Postgresql database needed to be manually configured, a whole bunch of database drivers needed to be installed, and the Windows registry of ODBC connections needed to be exported/imported. But the whole process was a lot easier than what I was first presented.

Upgrading the first production server was mostly seamless — except users appear to have had the server’s actual name. Instead of accessing https://tableau.example.com, they were typing abcwxy129.example.com. And passing that link around as “the link” to their dashboard. And, upon stopping the Tableau services on the old server … those links started to fail. Now, I could have just CNAMED abcwxy129 over to tableau and left it at that. But letting users continue to do the wrong thing always seems to come back and haunt you (if nothing else, the OS folks own the namespace of servers & are allowed to re-use or delete those hostnames at will). So I wanted something that would take whatever https://tableau.example.com/#/site/DepartMent/workbooks/3851/Views kind of URL a user provided and give them the right address. And, since this was Windows, to do so with IIS without the hassle of integrating PHP or building a C# project. Basically, I wanted to do it within basic HTML. Which meant JavaScript.

And I did it — using such a basic IIS installation that the file is named something like iisstart.htm so I didn’t have to change the default page name. I also redirected 404 to / so any path under the old server’s name will return the redirection landing page.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
		<title>This Tableau server has moved</title>
		
		<style type="text/css">
			<!--
			body {
				color:#000000;
				background-color:#eeeeee;
				margin:0;
			}
			-->
		</style>
	</head>
	<body>
		<P><ul>
		<h2>The Tableau server has moved. </h2>
		<P>The location you accessed, <span style="white-space: nowrap" id="oldurl"></span>, is no longer available.<br><br> Please update your link to use <span style="white-space: nowrap"  id="newurl"></span></p>
		</ul></P>
	
		<script>
			let strOldURL = window.location.href;

			let strNewURL = strOldURL.replace(/hostname.example.com/i,"tableau.example.com");
			strNewURL = strNewURL.replace(/otherhostname.example.com/i,"tableau.example.com");

			document.getElementById("oldurl").innerHTML = window.location.href;
			document.getElementById("newurl").innerHTML = "<a href=" + strNewURL + ">" + strNewURL + "</a>";
		</script>
	
	</body>
</html>

Exchange Disaster Recovery

I guess with everyone moving to magic cloudy pay-per-month Exchange, this isn’t such a concern anymore … but for those still running on-premise Exchange:

(1) Before you can restore your AD system state, you’ve got to build a server & bring up a temporary domain. There’s a “System Configuration” program that lets you select to restart in safe mode / directory services restore mode without having to time the F8 key or anything.

(2) The system state backup of a domain controller backs up a lot of stuff — including the registry which tells the server what software is installed and services. This means it is not possible to just run the Exchange setup.exe with the disaster recovery option. Fortunately, I was able to copy the Exchange folder from program files off of a backup. Unfortunately, the Exchange services wouldn’t start because DLLs couldn’t register. Did a diff between old server backup & new one — copied any missing stuff from c:\windows\system32 and c:\windows\syswow64 and, voila, Exchange is starting. Couldn’t mount the ebd file, though …

(3) Which brings me to eseutil an attempt to replay the transaction logs (eseutil /r) and then repair the database as much as possible (eseutil /p) got me an EDB file that the Exchange server could mount.

Grafana — SSO With PingID (OAuth)

I enabled SSO in our development Grafana system today. There’s not a great user experience with SSO enabled because there is a local ‘admin’ user that has extra special rights that aren’t given to users put into the admin role. If you just enable SSO, there is a new button added under the logon dialogue that users can use to initiate an SSO authentication. That’s not great, though, since most users really should be using the SSO workflow. And people are absolutely going to be putting their login information into that really obvious set of text input fields.

Grafana has a configuration to bypass the logon form and just always go down the OAUTH authentication:

# Set to true to attempt login with OAuth automatically, skipping the login screen.
# This setting is ignored if multiple OAuth providers are configured.
oauth_auto_login = true

Except, now, the rare occasion we need to use the local admin account requires us to set this to false, restart the service, do our thing, change the setting back, and restart the service again. Which is what we’ll do … but it’s not a great solution either.

 

Config to authenticate Grafana to PingID using OAUTH

#################################### Generic OAuth ##########################
[auth.generic_oauth]
name = PingID
enabled = true
allow_sign_up = true
client_id = 12345678-1234-4567-abcd-123456789abc
client_secret = abcdeFgHijKLMnopqRstuvWxyZabcdeFgHijKLMnopqRstuvWxyZ
scopes = openid profile email
email_attribute_name = email:primary
email_attribute_path =
login_attribute_path = user
role_attribute_path =
id_token_attribute_name =
auth_url = https://login.example.com/as/authorization.oauth2
token_url = https://login.example.com/as/token.oauth2
api_url = https://login.example.com/idp/userinfo.openid
allowed_domains =
team_ids =
allowed_organizations =
tls_skip_verify_insecure = true
tls_client_cert =
tls_client_key =
tls_client_ca =

Mounting DD Raw Image File

And a final note from my disaster recovery adventure — I had to use ddrescue to copy as much data from a corrupted drive as possible (ddrescue /dev/sdb /mnt/vms/rescue/backup.raw –try-again –force –verbose) — once I had the image, what do you do with it? Fortunately, you can mount a dd file and copy data from it.

# Mounting DD image
2023-04-17 23:54:01 [root@fedora /]# kpartx -l backup.raw
loop0p1 : 0 716800 /dev/loop0 2048
loop0p2 : 0 438835200 /dev/loop0 718848

2023-04-17 23:55:08 [root@fedora /]# mount /dev/mapper/loop0p2 /mnt/recovery/ -o loop,ro
mount: /mnt/recovery: cannot mount /dev/loop1 read-only.
       dmesg(1) may have more information after failed mount system call.

2023-04-17 23:55:10 [root@fedora /]# mount /dev/mapper/loop0p2 /mnt/recovery/ -o loop,ro,norecovery

2023-04-18 00:01:03 [root@fedora /]# ll /mnt/recovery/
total 205G
drwxr-xr-x  2 root root  213 Jul 14  2021 .
drwxr-xr-x. 8 root root  123 Apr 17 22:38 ..
-rw-r--r--. 1 root root 127G Apr 17 20:35 ExchangeServer.qcow2
-rw-r--r--. 1 qemu qemu  10G Apr 17 21:42 Fedora.qcow2
-rw-r--r--. 1 qemu qemu  15G Apr 17 14:05 FedoraVarMountPoint.qcow2