Tag: replication

MongoDB: Changing Host in Replica Set

When we get replacement servers at work, they frequently build a new server with a temporary name and IP address with the plan of swapping the host name and IP with the decommed server. So my Server123 gets turned down, Server123-Temp gets renamed to Server123, and the IP from the old server is configured on the replacement. Everything is operating exactly as it was before even if the direct host name or IP address were used — great for not needing to update firewall rules and vpn profiles, but I encountered a bit of a problem with the MongoDB cluster.

When I initiated the replica set, I did not have to specify a host name. It pulled the host name from the system — which happily provided that temporary name that doesn’t really exist (it’s not in DNS). Which was fine — I could add the temporary name to /etc/hosts along with the future name that I’ve remapped to the current IP so my “new” VMs all talk to each other and the old servers don’t get mucked up.

But, eventually, I’d like the replica set to have the right names. Had I known about this ahead of time, I’d simply have changed the host name value on the box to be the permanent name, initialized the replica set, and returned the temporary name to the box. But I didn’t, and I didn’t really want to start from 0 with the database I’d restored. Luckily, it turns out there’s a process for re-creating the replica set without destroying the data.

First, edit the mongo.conf file and comment out the replica set block. Restart each server with the new config. Then delete the “local” database from each MongoDB server using mongo local --eval "db.dropDatabase()"

Uncomment the replica set block in mongo.conf and restart MongoDB again — initialize the replica set again (make sure the server “knows” it’s proper name first!)

Postgresql Replication Lag

Replication involves sending records from the master, receiving the record on the remote replica, writing the record on the remote replica, and flushing the record to persistent storage, and finally replaying the record. Replication lag can occur when a large amount of data hasn’t been fully replayed into the remote replica. Identifying ​where​ the lag occurs can help in rectifying the underlying problem.

select client_addr, usename, application_name, state, sync_state, (pg_wal_lsn_diff(pg_current_wal_lsn(),sent_lsn) / 1024)::bigint as PendingLag, (pg_wal_lsn_diff(sent_lsn,write_lsn) / 1024)::bigint as WriteLag, (pg_wal_lsn_diff(write_lsn,flush_lsn) / 1024)::bigint as FlushLag, (pg_wal_lsn_diff(flush_lsn,replay_lsn) / 1024)::bigint as ReplayLag, (pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn))::bigint / 1024 as TotalLag FROM pg_stat_replication;

Commented to explain what each column means:

select client_addr , usename , application_name, state, sync_state,
    (pg_wal_lsn_diff(pg_current_wal_lsn(),sent_lsn) / 1024)::bigint as PendingLag,    -- The amount of WAL data that hasn't been sent ... check network stuff if lag persists
    (pg_wal_lsn_diff(sent_lsn,write_lsn) / 1024)::bigint as WriteLag,                 -- The amount of replayed log data that isn't applied  ... check iostat stuff if lag persists
    (pg_wal_lsn_diff(write_lsn,flush_lsn) / 1024)::bigint as FlushLag,                -- similar to write lag, and often these two numbers are high in conjunction
    (pg_wal_lsn_diff(flush_lsn,replay_lsn) / 1024)::bigint as ReplayLag,              -- The amount of log data that is waiting to be replayed ... check iostat stuff but could also be high CPU or memory utilization
    (pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn))::bigint / 1024 as TotalLag     -- Basically a sum of the previous values
FROM pg_stat_replication;