Category: Database

MongoDB: Changing Host in Replica Set

When we get replacement servers at work, they frequently build a new server with a temporary name and IP address with the plan of swapping the host name and IP with the decommed server. So my Server123 gets turned down, Server123-Temp gets renamed to Server123, and the IP from the old server is configured on the replacement. Everything is operating exactly as it was before even if the direct host name or IP address were used — great for not needing to update firewall rules and vpn profiles, but I encountered a bit of a problem with the MongoDB cluster.

When I initiated the replica set, I did not have to specify a host name. It pulled the host name from the system — which happily provided that temporary name that doesn’t really exist (it’s not in DNS). Which was fine — I could add the temporary name to /etc/hosts along with the future name that I’ve remapped to the current IP so my “new” VMs all talk to each other and the old servers don’t get mucked up.

But, eventually, I’d like the replica set to have the right names. Had I known about this ahead of time, I’d simply have changed the host name value on the box to be the permanent name, initialized the replica set, and returned the temporary name to the box. But I didn’t, and I didn’t really want to start from 0 with the database I’d restored. Luckily, it turns out there’s a process for re-creating the replica set without destroying the data.

First, edit the mongo.conf file and comment out the replica set block. Restart each server with the new config. Then delete the “local” database from each MongoDB server using mongo local --eval "db.dropDatabase()"

Uncomment the replica set block in mongo.conf and restart MongoDB again — initialize the replica set again (make sure the server “knows” it’s proper name first!)

MongoDB: Increasing Log Level

We had a problem with an application accessing our MongoDB cluster, and the log files didn’t provide much useful information. I could see the client connect and disconnect … but nothing in between. I discovered that the default logging level is very low. Good for disk utilization and I/O, but not great for troubleshooting.

db.runCommand({getParameter: 1, logLevel: 1}) # Get the current logging level
db.setLogLevel(3) # Fairly robust logging
db.setLogLevel(5) # don't try this is prod huge stream of text super logging
db.setLogLevel(0) # and set logging back to a low level once you are done troubleshooting

You can also increase the log level for individual components of MongoDB to minimize logging I/O:

db.setLogLevel(2, "accessControl" )

 

MongoDB: Setting Up a Replica Set

On one server create a key file. Copy this key file to all other servers that will participate in the replica set

mkdir -p /opt/mongodb/keys/
openssl rand -base64 756 > /opt/mongodb/keys/$(date '+%Y-%m-%d').key
chmod 400 /opt/mongodb/keys/$(date '+%Y-%m-%d').key
chown -R mongodb:mongodb /opt/mongodb/keys/$(date '+%Y-%m-%d').key

On each server, edit /etc/mongo.conf and add the keyfile to the security section and define a replica set

security:
 authorization: enabled
 keyFile:  /etc/mongodb/keys/mongo-key
#replication:
replication:
  replSetName: "myReplicaSet"

Restart MongoDB on each node.

On one server, use mongosh to enter the MongDB shell.

rs.initiate(
{
_id: "myReplicaSet",
members: [
{ _id: 0, host: "mongohost1.example.net" },
{ _id: 1, host: "mongohost2.example.net" },
{ _id: 2, host: "mongohost3.example.net" }
]
})

Use rs.status() to view the status of the replica set. If it is stuck in STARTING … check connectivity. If the port is open, I ran into a snag with some replacement servers. They’ve got temporary hostnames. But you cannot add a host on itself — it ignores that you typed mongohost1.example.net … and it takes it’s hostname value. And then sends that value to the other servers in the replica set. If you cannot change the hostname to match what you want, there is a process to change the hostname in a replicaset.

MongoDB: Where is my shell?!?

We are upgrading servers from really old MongoDB (4.2.15) to much new MongoDB (6.something). I am used to getting into the MongoDB shell using:

mongoserver:~ # mongo -u $STRMONGOUSER -p $STRMONGOPASS
MongoDB shell version v4.2.15
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("5658a72f-fea0-4316-aa97-4b0c0ffab7ff") }
MongoDB server version: 4.2.15

Except the new server says there’s no such file. And there isn’t. A bit of research later, I learn that the shell is now called mongosh … which is a more reasonable name. It works the same way: mongosh -u $STRMONGOUSER -p $STRMONGOPASS gets me there, and all of the commands I know work.

Backing up (and restoring) *All* Data in MongoDB

The documentation on Mongo’s website tells you to use mongodump with a username, password, destination, and which database you want to back up. Except I wanted to back up and restore everything. Users, multiple databases, I don’t really know what else is in there hence I want everything instead of enumerating the things I want.

Turns out you can just omit the database name and it dumps everything

mongodump --uri="mongodb://<host URL/IP>:<Port>" -u $STRMONGODBUSER -p $STRMONGODBPASS

And restore with

mongorestore --uri="mongodb://<host URL/IP>:<Port>"

Since it’s a blank slate with no authentication or users defined yet.

MongoDB: Basics

We inherited a system that uses MongoDB, and I managed to get the sandbox online without actually learning anything about Mongo. The other environments, though, have data people care about set up in a replicated cluster of database servers. That seems like the sort of thing that’s going to require knowing more than “it’s a NoSQL database of some sort”.

It is a NoSQL database — documents are organized into ‘collections’ within the database. You can have multiple databases hosted on a server, too. A document is a group of key/value pairs with dynamic schema (i.e. you can just make up keys as you go).

There are GUI clients and a command-line shell … of course I’m going with the shell 🙂 There is a db function for basic CRUD operations using db.nameOfCollection then the operation type:

db.collectionName.insert({"key1": "string1", "key2" : false, "key3": 12345})
db.collectionName.find({key3 : {$gt : 10000} })
db.collectionName.update({key1 : "string1"}, {$set: {key3: 100}})
db.collectionName.remove({key1: "string1"});

CRUD operations can also be performed with NodeJS code — create a file with the script you want to run, then run “node myfile.js”

Create a document in a collection

var objMongoClient = require('mongodb').MongoClient;
var strMongoDBURI = "mongodb://mongodb.example.com:27017/";
  
objMongoClient.connect(strMongoDBURI, function(err, db) {
  if (err) throw err;
    var dbo = db.db("dbNameToSelect");
    var objRecord = { key1: "String Value1", key2: false };
    dbo.collection("collectionName").insertOne(objRecord, function(err, res) {
         if (err) throw err;
         console.log("document inserted");
         db.close();
    });
}); 

Read a document in a collection

var objMongoClient = require('mongodb').MongoClient;
var strMongoDBURI = "mongodb://mongodb.example.com:27017/";

objMongoClient.connect(strMongoDBURI, function(err, db) {
  if (err) throw err;
    var dbo = db.db("dbNameToSelect");
    var objQuery = { key1: "String Value 1" };
    dbo.collection("collectionName").find(objQuery).toArray(function(err, result) {
     if (err) throw err;
     console.log(result);
     db.close();
  });
}); 

Update a document in a collection

var objMongoClient = require('mongodb').MongoClient;
var strMongoDBURI = "mongodb://mongodb.example.com:27017/";

objMongoClient.connect(strMongoDBURI, function(err, db) {
if (err) throw err;
  var dbo = db.db("dbNameToSelect");
  var objRecord= { key1: "String Value 1" };
  dbo.collection("collectionName").deleteOne(objRecord, function(err, obj) {
    if (err) throw err;
    console.log("Record deleted");
    db.close();
});
}); 

Delete a document in a collection

var objMongoClient = require('mongodb').MongoClient;
var strMongoDBURI = "mongodb://mongodb.example.com:27017/";

objMongoClient.connect(strMongoDBURI, function(err, db) {
if (err) throw err;
  var dbo = db.db("dbNameToSelect");
  var objQuery = { key1: "String Value 1" };
  var objNewValues = { $set: {key3: 12345, key4: "Another string value" } };
  dbo.collection("collectionName").updateOne(objQuery, objNewValues , function(err, res) {
    if (err) throw err;
    console.log("Record updated");
    db.close();
   });
}); 

NEO4J: Exploring the Data

Since I seem to frequently acquire orphaned platforms with no documentation, I figure it would be good to figure out how to investigate an unknown NEO4J platform to see what it’s got. “SHOW” is very useful in these cases. The full list of SHOW commands is:

"ALIAS"
"ALIASES"
"ALL"
"BTREE"
"BUILT"
"CONSTRAINT"
"CONSTRAINTS"
"CURRENT"
"DATABASE"
"DATABASES"
"DEFAULT"
"EXIST"
"EXISTENCE"
"EXISTS"
"FULLTEXT"
"FUNCTION"
"FUNCTIONS"
"HOME"
"INDEX"
"INDEXES"
"KEY"
"LOOKUP"
"NODE"
"POINT"
"POPULATED"
"PRIVILEGE"
"PRIVILEGES"
"PROCEDURE"
"PROCEDURES"
"PROPERTY"
"RANGE"
"REL"
"RELATIONSHIP"
"ROLE"
"ROLES"
"SERVER"
"SERVERS"
"SETTING"
"SETTINGS"
"SUPPORTED"
"TEXT"
"TRANSACTION"
"TRANSACTIONS"
"UNIQUE"
"UNIQUENESS"
"USER"
"USERS"

The most useful ones for figuring out what you’ve got … the “show databases” command I know from MySQL/MariaDB does what I expected – you can also include a specific database name, but “show database ljrtest” doesn’t appear to list any more information than the generic show databases command. .

There’s also a “show users” command that outputs the users in the database – although I’m using the community edition without authentication, so there isn’t much interesting information being output here.

And roles, if roles are being used, should be output with “SHOW ALL ROLES” … but mine just says “Unsupported administration command”

Once you know what databases you’ve got and who can log in and do stuff, we’d want to look at the data. There are some built-in functions that will help us out here. The db.labels() function will list the labels in the selected database.

You can also return a distinct list of labels along with the count of nodes with that label:

Since a node can have multiple, comparing that total with a count of nodes would give you an idea if there are many more labels than nodes. In my case, either view shows 156 … so I know there are few (if any) nodes with multiple labels.

To view the types of relationships defined in the selected database, use “CALL db.relationshipTypes()”

Similarly, you can return the relationship types along with counts

There is a function to list the property keys used within the data – interesting to note that keys that were used but the nodes using them were subsequently deleted … they still show up as property keys. Basically, anything that is there is on this list but some things on this list may not be there anymore. In this example, ‘parentm’ and ‘parentf’ were labels I used to build relationships programmatically.

I’ve found db.schema.nodeTypeProperties to be more useful in this regard – it does not appear to list properties that are not in use, and the output includes field types

To see if there are any custom procedures or functions registered, look on the server(s). Use ps -efww to view the running command – there will be some folders listed after “-cp” … you could find procedures or plugins in any of those folders. In my case, the plugins are in /plugins

And the only “custom” things registered are the APOC and APOC-Extended jar’s

 

Neo4J – The Importance of the Data Model

While I am certain table-based SQL databases required planning to establish a reasonable data model – optimizing storage, defining foreign keys, indexing … I have found it more challenging to create a good data model in Neo4j. Maybe that’s because I normally populate SQL tables with custom scripts that can be modified to handle all sorts of edge cases. Maybe I’m still thinking in tables, but there seems to be more trial and error in defining the data model than I’ve ever had in SQL databases.

In the import-from-html-table example, a candidate often is associated with multiple elections. Storing candidates as nodes and elections as other nodes that contain results (electoral college votes for winner & loser and popular votes for winner & loser) then associating candidates with elections allowed me to store data about US elections in the Graph. I know who ran, who won and who lost, and what the results were for each election.

Associating the results with candidates didn’t work because Franklin D Roosevelt only has one property for “EC_VOTES” … which election does that reflect? I could also have added the vote totals to the relationship but that would either separate the data (the loser’s votes are stored on LOST relationships and the winner’s are stored on WON relationships) or data duplication (both WON and LOST relationships contain the same vote numbers).

Query used to populate the data:

CALL apoc.load.html("https://www.iweblists.com/us/government/PresidentialElectionResults.html",
{electionyear: "#listtable tbody tr td:eq(0)"
, winner: "#listtable tbody tr td:eq(1)"
, loser: "#listtable tbody tr td:eq(2)"
, electoral_win: "#listtable tbody tr td:eq(3)"
, electoral_lose: "#listtable tbody tr td:eq(4)"
, popular_win: "#listtable tbody tr td:eq(5)"
, popular_delta: "#listtable tbody tr td:eq(6)" }) yield value
WITH value, size(value.electionyear) as rangeup

UNWIND range(0,rangeup) as i WITH value.electionyear[i].text as ElectionYear
, value.winner[i].text as Winner, value.loser[i].text as Loser
, value.electoral_win[i].text as EC_Winner, value.electoral_lose[i].text as EC_Loser
, value.popular_win[i].text as Pop_Vote_Winner
, value.popular_delta[i].text as Pop_Vote_Margin

MERGE (election:Election {year: coalesce(ElectionYear,"Unknown")})
SET election.EC_Votes_Winner = coalesce(EC_Winner,"Unknown")
SET election.EC_Votes_Loser = coalesce(EC_Loser,"Unknown")
SET election.Pop_Votes_Winner = apoc.text.replace(Pop_Vote_Winner, ",", "")
SET election.Pop_Votes_Loser = apoc.number.exact.sub(apoc.text.replace(Pop_Vote_Winner, ",", ""), apoc.text.replace(Pop_Vote_Margin, ",", ""))

MERGE (ew:CANDIDATE {name: coalesce(Winner,"Unknown")})
MERGE (el:CANDIDATE {name: coalesce(Loser,"Unknown")})

MERGE (ew)-[:WON]->(election) MERGE (el)-[:LOST]->(election);

 

NEO4J: More Cypher Queries

To get a count of returned records, Cypher uses COUNT pretty much the same way as SQL does

Interestingly, there are other aggregation functions that remind me of using the ELK API — I can get averages, min/max, and standard deviation.

 

Chaining MATCH statements functions similarly to a SQL JOIN — get the items with this label, add in some other stuff. And, just like an INNER JOIN, this means no data is returned when one of the conditions has no matches — Bill Clinton never lost an election, so we get a null data set here:

The equivalent of an outer join is an OPTIONAL MATCH — here, the records from the first MATCH will be returned even if there is no corresponding record matching the second MATCH

ORDER BY also works in the same way it does in SQL. Multiple order parameters are separated by a comma and add DESC to do a DESCENDING ORDER

WHERE can be used to create the equivalent of a LIKE query — the where =~ operator uses regular expression syntax, so you don’t just use % or * as a wildcard. Regex wildcards like .* (match any character zero or more times) are used.

NEO4J: Using APOC To Load HTML Table Data

I’ve been playing around with loading neo4j data from random tables on the web using apoc.load.html from the extended APOC library. The first trick to it is knowing how to use jquery to find elements of a webpage — the table named “listtable” then the path down to the data elements (tbody tr td) and column numbers.

Once you have extracted the data, you can then manipulate it, map it into fields, create relationships, etc.

UNWIND is used as a “for each” loop that allows us to iterate through the result set.

MERGE creates or updates records (which, in this case, means I have a poor data model … someone could well have run in multiple elections and I am not really accommodating those cases well. Since I don’t actually want a database of presidential elections but was really just testing some new-to-me functionality … we’re going to ignore these logic problems)

SET adds (or updates) properties of the node.

CALL apoc.load.html("https://www.iweblists.com/us/government/PresidentialElectionResults.html",
{electionyear: "#listtable tbody tr td:eq(0)"
	, winner: "#listtable tbody tr td:eq(1)"
	, loser: "#listtable tbody tr td:eq(2)"
	, electoral_win: "#listtable tbody tr td:eq(3)"
	, electoral_lose: "#listtable tbody tr td:eq(4)"
	, popular_win: "#listtable tbody tr td:eq(5)"
	, popular_delta: "#listtable tbody tr td:eq(6)" }) yield value 

WITH value, size(value.electionyear) as rangeup

UNWIND range(0,rangeup) as i WITH value.electionyear[i].text as ElectionYear
	, value.winner[i].text as Winner
	, value.loser[i].text as Loser
	, value.electoral_win[i].text as EC_Winner
	, value.electoral_lose[i].text as EC_Loser
	, value.popular_win[i].text as Pop_Vote_Winner
	, value.popular_delta[i].text as Pop_Vote_Delta

MERGE (ew:Candidate {name: coalesce(Winner,"Unknown")}) 
MERGE (el:Record {name: coalesce(Loser,"Unknown")}) 
SET ew.EC_Votes = coalesce(EC_Winner,"Unknown") 
SET el.EC_Votes = coalesce(EC_Loser,"Unknown")
SET ew.Year = ElectionYear
SET el.Year = ElectionYear

WITH *, replace(Pop_Vote_Delta,",","") as Pop_Vote_Delta_Int, replace(Pop_Vote_Winner,",","") as Pop_Winner_Int

SET ew.Pop_Votes = Pop_Winner_Int
SET el.Pop_Votes = apoc.number.exact.sub(Pop_Winner_Int, Pop_Vote_Delta_Int)

MERGE (ew)-[:DEFEATED]->(el);