Month: March 2022

Analyzing Postgresql Tmp Files

Postgresql stores temporary files for in-flight queries — these don’t normally hang around for long, but sorting a large amount of data or building a large hash can create a lot of temp files. A dead query that was sorting a large amount of data or …. well, we’ve gotten terabytes of temp files associated with multiple backend process IDs. The file names are algorithmic — a string “pgsql_tmp followed by the backend PID, a period, and then some other number. Thus, I can extract the PID from each file name and provide a summary of the processes associated with temp files.

To view a summary of the temp files within the pgsql_tmp folder, run the following command to print a count then a PID number:
ls /path/to/pgdata/base/pgsql_tmp | sed -nr 's/pgsql_tmp([0-9]*)\.[0-9]*/\1/p' | sort | uniq -c

A slightly longer command can be used to reverse the columns – producing a list of process IDs followed by the count of files for that PID – too:
ls /path/to/pgdata/base/pgsql_tmp | sed -nr 's/pgsql_tmp([0-9]*)\.[0-9]*/\1/p' | sort | uniq -c | sort -k2nr | awk '{printf("%s\t%s\n",$2,$1)}END{print}'

 

Tracking Down Which Pod is Exhausting IP Connections

We’ve been seeing an error that prevents clients from connecting to Postgresql servers – basically that all available connections are in use and the remaining connections are reserved for superuser and replication activity.

First, we need to determine what the connection limit is

SELECT setting, source, sourcefile, sourceline FROM pg_settings WHERE name = 'max_connections';

And if there are any per-user connection limits – a limit of -1 means unlimited connections are allowed.

SELECT rolname, rolconnlimit FROM pg_roles

The next step is to identify what connections are exhausting available connections – are there a lot of long-running queries? Are there just more active queries than anticipated? Are there a bunch of idle connections?

SELECT pid, usename, client_addr, client_port 
 ,to_char(pg_stat_activity.query_start, 'YYYY-MM-DD HH:MI:SS') as query_start
 , state, query 
FROM pg_stat_activity
-- where state = 'idle'
-- and usename = 'app_user'
order by query_start;

In our case, there were over 100 idle connections using up about 77% of the available connections. Auto-vacuum, client read operations, and replication easily filled up the remaining available connections.

Because the clients keeping these idle connections open are an app running in a Kubernetes cluster, there’s an extra layer of complexity identifying where the connection is actually sourced. When you view the list of connections from the Postgresql server’s perspective, “client_addr” is the worker hosting the pod.

On the worker server, use conntrack to identify the actual source of the connection – the IP address in “-d” is the IP address of the Postgresql server. To isolate a specific connection, select a “client_port” from the list of connections (37900 in this case) and grep for the port. You will see the src IP of the individual POD.

lhost1750:~ # conntrack -L -f ipv4 -d 10.24.29.140 -o extended | grep 37900
ipv4 2 tcp 6 86394 ESTABLISHED src=10.244.4.80 dst=10.24.29.140 sport=37900 dport=5432 src=10.24.29.140 dst=10.24.29.155 sport=5432 dport=37900 [ASSURED] mark=0 use=1
conntrack v1.4.4 (conntrack-tools): 27 flow entries have been shown.

Then use kubeadm to identify which pod is assigned that address:

lhost1745:~ # kubectl get po --all-namespaces -o wide | grep "10.244.4.80"
kstreams kafka-stream-app-deployment-1336-d8f7d7456-2n24x 2/2 Running 0 10d 10.244.4.80 lhost0.example.net <none> <none>

In this case, we’ve got an application automatically scaling up that can have 25 connections help open and idle … so there isn’t really a solution other than increasing the number of available connections to a number that’s appropriate given the number of client connections we plan on leaving open. I also want to enact a connection limit on the individual account – if there are 250 connections available on the Postgresql server, then limit the application to 200 of those connections.

 

2>/dev/null

A few times now, I’ve encountered individuals with cron jobs or bash scripts where a command execution ends in 2>/dev/null … and the individual is stymied by the fact it’s not working but there’s no clue as to why. The error output is being sent into a big black hole never to escape!

The trick here is to understand file descriptors — 1 is basically a shortcut name for STDOUT and 2 is basically a shortcut name for STDERR (0 is STDIN, although that’s not particularly relevant here).  So 2>/dev/null says “take all of the STDERR stuff and redirect it to /dev/null”.

Sometimes you’ll see both STDERR and STDOUT being redirected either to a file or to /dev/null — in that case you will see 2>&1 where the ampersand prior to the “1” indicates the stream is being redirected to a file descriptor (2>1 would direct STDOUT to a file named “1”) — so >/dev/null 2>&1 is the normal way you’d see it written. Functionally, >/dev/null 1>&2 would be the same thing … but redirecting all output into error is, conceptually, a little odd.

To visualize all of this, use a command that will output something to both STDERR and STDOUT — for clarify, I’ve used “1>/dev/null” (redirect STDOUT to /devnull) in conjunction with 2>&1 (redirect STDERR to STDOUT). As written in the text above, the number 1 is generally omitted and just >/dev/null is written.

 

 

SSL Connection Failure from Docker Image

We have a script that’s used to securely retrieve passwords … a script which failed when run from a Docker container.

* could not load PEM client certificate, OpenSSL error error:140AB18E:SSL routines:SSL_CTX_use_certificate:ca md too weak, (no key found, wrong pass phrase, or wrong file format?)

Appears root of issue is tied to Debian OS that’s used in the python:3.7-slim container that’s being used. Newer iterations of some Linux OS’s have a default setting in the openssl config that provide a setting for SSL_CTX_set_security_level that precludes communication with password server.

Remediating this at the server end is not a reasonable approach, so client config needs to be changed to allow connection to be established. Setting security level to 1 allows connection to proceed, so proposed including additional instruction in Dockerfile that uses sed to update the configuration parameter.

sed -i 's/DEFAULT@SECLEVEL=2/DEFAULT@SECLEVEL=1/' /etc/ssl/openssl.cnf

Once that setting was updated, the script worked perfectly as it does on our physical and VM servers.

Duck Egg Challah

We’ve got a lot of duck eggs, so I’ve been trying to find recipes where the extra rich egg … and I thought of challah because most recipes I’ve seen call for an extra yolk or two. Using duck eggs instead of the chicken eggs produced a really delicious, buttery loaf of bread.

4 1/2 cups all purpose flour
1 Tbsp yeast
2 tsp salt
3/4 cups warm water
6 Tbsp olive oil
4 Tbsp maple syrup
3 duck eggs (one was a double yolk)

Put the yeast in the warm water and let it sit until it becomes frothy.

Mix the dry ingredients, add the oil, maple, and yeast water and knead until a smooth dough ball forms.

Allow to raise until doubled in size, gently punch down. Shape and allow to raise again until doubled. Mix one egg with a tablespoon or two of water and brush over the dough.

I was making rolls for some fancy sausages we got from the farmers market — so I formed them into oval rolls and baked them at 375F for about 20 minutes. You can make ropes and braid a normal challah loaf — that would bake at 350F for about 35 minutes.

Reporting Last Patch Dates on Fedora / RedHat / CentOS Systems

I needed to verify the last time a bunch of servers were patched — basically to ensure compliance with the stated quarterly patching interval. This python script pulls the list of installed packages and the date for each package, sorts the info by date DESC, and then reports the latest date on any packages — as well as the number of packages updated on that date. If there’s only one … the system still might bear some investigation. But if a couple of dozen packages were updated in the past quarter … we don’t need to be too worried about turning up on the out-of-compliance report.

import subprocess
import re
import datetime
from collections import OrderedDict

def getFirstElement(odictInput):
    '''
        This function returns the first element from an ordered collection (an arbitrary element if an unordered collection is passed in)
        Input -- odictInput -- ordered collection
        Output -- type varies -- first element of ordered collection, arbitrary element of unordered collection

    '''
    return next(iter(odictInput))

listHosts = ['host01.example.com', 'host02.example.com', 'host03.example.com','host04.example.com','host05.example.com']

for strHost in listHosts:
        dictPatchDates = {}

        objResults = subprocess.Popen(['ssh', strHost, 'rpm', '-qa', '--last'],stdout=subprocess.PIPE)
        for strLine in objResults.stdout:
                strPackageInfo  = strLine.decode('utf-8').rstrip()
                listPackageInfo = re.split(r'\s*([a-zA-Z]{3,}\s[0-9]{2,}\s[a-zA-Z]{3,}\s[0-9]{2,})',strPackageInfo)
                strUpdateDate = listPackageInfo[1]
                dateUpdateDate = datetime.datetime.strptime(strUpdateDate, "%a %d %b %Y").date()
                if dictPatchDates.get(dateUpdateDate) is not None:
                        dictPatchDates[dateUpdateDate] = dictPatchDates[dateUpdateDate] + 1
                else:
                        dictPatchDates[dateUpdateDate] = 1

        dictOrderedPatchDates = OrderedDict(sorted(dictPatchDates.items(), key=lambda t: t[0],reverse=True))
        dateLatestPatch = getFirstElement(dictOrderedPatchDates)
        print(f"{strHost}\t{dateLatestPatch}\t{dictOrderedPatchDates[dateLatestPatch]}")

Building Gerbera on Fedora

There is a great deal of documentation available for building Gerbera from source on a variety of Linux flavors. Unfortunately, Fedora isn’t one of those (and the package names don’t exactly match up to let you replace “apt-get” with “yum” and be done). So I am quickly documenting the process we followed to build Gerbera from source.

The Fedora build of Gerbera has the binaries in /usr/bin and the manual build places the gerbera binary in /usr/local/bin — the build updates the unit file to reflect this change, but this means you want to back up any customizations you’ve made to the unit file before running “make install”.

You need the build system — cmake, g++, etc and the devel packages from the following table as required by your build options

Additional packages that we needed to install: automake, autoconf, libtool

Library Fedora Package Required? Note Compile-time option Default
libpupnp libupnp-devel XOR libnpupnp pupnp
libnpupnp Build from source (if needed) XOR libupnp I was only able to locate this as a source, not available from Fedora repos WITH_NPUPNP Disabled
libuuid libuuid-devel Required Not required on *BSD
pugixml pugixml-devel Required XML file and data support
libiconv glibc-headers Required Charset conversion
sqlite3 sqlite-devel Required Database storage
zlib zlib-devel Required Data compression
fmtlib fmt-devel Required Fast string formatting
spdlog spdlog-devel Required Runtime logging
duktape duktape-devel Optional Scripting Support WITH_JS Enabled
mysql mariadb-devel Optional Alternate database storage WITH_MYSQL Disabled
curl libcurl-devel Optional Enables web services WITH_CURL Enabled
taglib taglib-devel Optional Audio tag support WITH_TAGLIB Enabled
libmagic file-devel Optional File type detection WITH_MAGIC Enabled
libmatroska libmatroska-devel Optional MKV metadata required for MKV WITH_MATROSKA Enabled
libebml libebml-devel Optional MKV metadata required for MKV WITH_MATROSKA Enabled
ffmpeg/libav ffmpeg-free-devel Optional File metadata WITH_AVCODEC Disabled
libexif libexif-devel Optional JPEG Exif metadata WITH_EXIF Enabled
libexiv2 exiv2-devel Optional Exif, IPTC, XMP metadata WITH_EXIV2 Disabled
lastfmlib liblastfm-devel Optional Enables scrobbling WITH_LASTFM Disabled
ffmpegthumbnailer ffmpegthumbnailer-devel Optional Generate video thumbnails WITH_FFMPEGTHUMBNAILER Disabled
inotify glibc-headers Optional Efficient file monitoring WITH_INOTIFY
libavformat libavformat-free-devel Required for 2.0
libavutil libavutil-free-devel Required for 2.0
libavcodec libavcodec-free-devel Required for 2.0

Then follow the generalized instructions — cd into the folder where you want to run the build and run (customizing the cmake line as you wish):

git clone https://github.com/gerbera/gerbera.git
mkdir build
cd build
cmake ../gerbera -DWITH_MAGIC=1 -DWITH_MYSQL=1 -DWITH_CURL=1 -DWITH_INOTIFY=1 -DWITH_JS=1 -DWITH_TAGLIB=1 -DWITH_AVCODEC=1 -DWITH_FFMPEGTHUMBNAILER=0 -DWITH_EXIF=1 -DWITH_EXIV2=1 -DWITH_SYSTEMD=1 -DWITH_LASTFM=0 -DWITH_DEBUG=1
make -j4
sudo make install

As with the Gerbera binary, the Fedora build places the web content in /usr/share/gerbera and the manual build places the web content into /usr/local/share/gerbera — yes, you can change the paths in the build, and I’m sure you can clue Gerbera into the new web file location. I opted for the quick/easy/lazy solution of running

mv /usr/share/gerbera /usr/share/gerbera/old
ln -s /usr/local/share/gerbera /usr/share/

To symlink the location my config thinks the web components should be located to the new files.

On the first start of Gerbera, SQL scripts may be run to update the database — don’t stop or kill the service during this process there’s no checkpoint restart of the upgrade process. We backed up /etc/gerbera/gerbera.db prior to starting our Gerbera installation. We’ve also wiped the database files to start from scratch and test changes that impacted how items are ingested into the database.

Fin.

Crocus Time

Crocus flowers are starting to sprout up around the yard — years ago, Anya and I planted crocus bulbs randomly throughout the front yard. Some have split into three or four plants already, and some are still single little flowers. We’ve even got some sprouting up elsewhere — maybe they’re spreading seeds too?

Reverse Osmosis Maple Sap Stats

We collected nine gallons of sap with SG of 1.009 = 2.3 Brix

We ran all of the sap through the reverse osmosis system at 60psi and had sap with SG 1.011 = 2.8 Brix

We ran the concentrated sap through the reverse osmosis system a second time, this time at 80psi and had sap with SG 1.022 = 5.6 Brix.

The “pure water” output SG was about 1.003 — we re-ran this through the RO as well.

At the end of the day, we have about 4 gallons of sap at 5.6% sugar, another gallon from the “pure water” run that’s a lower SG, and four gallons of water that’s removed.

Notes for the future:

  • We want to see what a single pass at higher pressure does — is it multiple passes that farther concentrated the sap or the higher pressure?
  • We took SG readings and converted to brix using an online converter. Next time, we should just take the readings in Brix 🙂
  • We might need a different refractometer to get accurate readings near 1 … not sure how accurate our tool is at the low end of the range.