Tag: bash

April 2, 2025

Linux Disk Utilization (du) Without Mount Points or /proc

I frequently use du -sh to determine what is using all the space on my Linux box … but mounting a data volume makes that simple command really suck (also, the mount point isn’t the problem for my local storage!). And all the /proc errors are just annoying — so pass in exclusions!

du -sh --exclude='/mnt/*' --exclude='/proc/*' /*

November 26, 2024

Quick sed For Sanitizing Config Files

When sending configuration files to other people for reference, I like to redact any credential-type information … endpoints that allow you to post data without creds, auth configs, etc. Sometimes I replace the string with REDACTED and sometimes I just drop the line completely.

Make a copy of the config files elsewhere, then run sed


# Retain parameter but replace value with REDACTED
sed -i 's|http_post_url: "https://.*"|post_url: "REDACTED"|' *.yaml

# Remove line from config
sed -i '/authorization: Basic/d' *.yaml

August 1, 2024

Docker Registry: Listing Images and Timestamps

I wanted a quick way to verify that Docker images have actually been pushed to the registry … I’m using Distribution, and only wanted to report on images that start with sample (because the repository is shared & I don’t want to read through the very long list of other people’s images)

#!/bin/bash

registry="registryhost.example.net:5443"
authHeader="Authorization: Basic AUTHSTRINGHERE"

# List all repositories
repositories=$(curl -s -H "$authHeader" https://$registry/v2/_catalog | jq -r '.repositories[]')

for repo in $repositories; do
  # Check if the repository name starts with "npm"
  if [[ $repo == sample* ]]; then

    # List all tags for the repository
    tags=$(curl -s -H "$authHeader" https://$registry/v2/$repo/tags/list | jq -r '.tags[]')

    for tag in $tags; do

      # Get the manifest for the tag
      manifest=$(curl -s -H "$authHeader" -H "Accept: application/vnd.docker.distribution.manifest.v2+json" https://$registry/v2/$repo/manifests/$tag)

      # Extract the digest for the config
      configDigest=$(echo $manifest | jq -r '.config.digest')

      # Get the config blob
      configBlob=$(curl -s -H "$authHeader" https://$registry/v2/$repo/blobs/$configDigest)

      # Extract the last modified date from the config history
      lastModifiedDate=$(echo $configBlob | jq -r '[.history[].created] | max')

      echo -e "$repo\t$tag\t$lastModifiedDate"
    done
  fi
done

January 16, 2024

RSync to Mirror Local Files

The rsync utility was meant to be used to sync files across the network — to or from an rsync server. For some time, I had a group of friends who shared documents off of my rsync server. Anyone with access could run an rsync command and sync their computer up with the group’s documents. With the advent of online file storage and collaborative editing, this was no longer needed. But I still use rsync to make sure my laptop has a local copy of a folder on the server. Mount /path/to/folder/contents/to/copy to the SMB or NFS share, and the following rsync command ensures the laptop’s /path/to/where/contents/should/be/placed has an exact mirror of the contents of the server folder

rsync –archive –verbose –update –delete “/path/to/folder/contents/to/copy/” “/path/to/where/contents/should/be/placed/”

–archive is a grouping of:
-r recursive
-l copy symlinks
-p preserve permissions
-t preserve modification timestamps
-g preserve group
-o preserve owner
–devices preserve device files (su only)
–specials preserve special files

July 17, 2023

Counting Messages in All Kafka Topics

For some reason, I am given a lot of Kafka instances that no one knows what they are or what they do. The first step, generally, is figuring out if it does anything. Because a server that no one has sent a message to in a year or two … well, there’s not much point in bringing it up to standard, monitoring it, and such. My first glance analysis has been just counting all of the messages in all of the topics to see which topics are actually used — quick bash script to accomplish this (presuming a Kafka broker is on port 9092 of the host running the script)

strTopics=$(./kafka-topics.sh --list --bootstrap-server $(hostname):9092)

SAVEIFS=$IFS   
IFS=$'\n'      
arrayTopics=($strTopics)
IFS=$SAVEIFS   

for i in "${arrayTopics[@]}"; do iMessages=`./kafka-console-consumer.sh --bootstrap-server $(hostname):9092 --topic $i --property print.timestamp=true --from-beginning --timeout-ms=10000 2>&1 | grep "Processed a total of"`;         echo "$i     $iMessages"; done

October 24, 2022

Bash – Fully qualified path/name from a directory

I needed to copy all of the files under a directory from a docker container — and there is a quick command line that will list the fully qualified path/filename for everything under the current directory:

find $PWD

July 6, 2022

Logstash, JRuby, and Private Temp

There’s a long-standing bug in logstash where the private temp folder created for jruby isn’t cleaned up when the logstash process exits. To avoid filling up the temp disk space, I put together a quick script to check the PID associated with each jruby temp folder, see if it’s an active process, and remove the temp folder if the associated process doesn’t exist.

When the PID has been re-used, this means we’ve got an extra /tmp/jruby-### folder hanging about … but each folder is only 10 meg. The impacting issue is when we’ve restarted logstash a thousand times and a thousand ten meg folders are hanging about.

This script can be cron’d to run periodically or it can be run when the logstash service launches.

import subprocess
import re

from shutil import rmtree

strResult = subprocess.check_output(f"ls /tmp", shell=True)

for strLine in strResult.decode("utf-8").split('\n'):
        if len(strLine) > 0 and strLine.startswith("jruby-"):
                listSplitFileNames = re.split("jruby-([0-9]*)", strLine)
                if listSplitFileNames[1] is not None:
                        try:
                                strCheckPID = subprocess.check_output(f"ps -efww |  grep {listSplitFileNames[1]} | grep -v grep", shell=True)
                                #print(f"PID check result is {strCheckPID}")
                        except:
                                print(f"I am deleting |{strLine}|")
                                rmtree(f"/tmp/{strLine}")

May 16, 2022

Using urandom to Generate Password

Frequently, I’ll use password generator websites to create some pseudo-random string of characters for system accounts, database replication,etc. But sometimes the Internet isn’t readily available … and you can create a decent password right from the Linux command line using urandom.

If you want pretty much any “normal” character, use tr to pull out all of the other characters:

'\11\12\40-\176'

Or remove anything outside of upper case, lower case, and number characters using

a-zA-Z0-9

Pass the output to head to grab however many characters you actually want. Voila — a quick password.

March 30, 2022

Analyzing Postgresql Tmp Files

Postgresql stores temporary files for in-flight queries — these don’t normally hang around for long, but sorting a large amount of data or building a large hash can create a lot of temp files. A dead query that was sorting a large amount of data or …. well, we’ve gotten terabytes of temp files associated with multiple backend process IDs. The file names are algorithmic — a string “pgsql_tmp followed by the backend PID, a period, and then some other number. Thus, I can extract the PID from each file name and provide a summary of the processes associated with temp files.

To view a summary of the temp files within the pgsql_tmp folder, run the following command to print a count then a PID number:
ls /path/to/pgdata/base/pgsql_tmp | sed -nr 's/pgsql_tmp([0-9]*)\.[0-9]*/\1/p' | sort | uniq -c

A slightly longer command can be used to reverse the columns – producing a list of process IDs followed by the count of files for that PID – too:
ls /path/to/pgdata/base/pgsql_tmp | sed -nr 's/pgsql_tmp([0-9]*)\.[0-9]*/\1/p' | sort | uniq -c | sort -k2nr | awk '{printf("%s\t%s\n",$2,$1)}END{print}'

March 25, 2022

2>/dev/null

A few times now, I’ve encountered individuals with cron jobs or bash scripts where a command execution ends in 2>/dev/null … and the individual is stymied by the fact it’s not working but there’s no clue as to why. The error output is being sent into a big black hole never to escape!

The trick here is to understand file descriptors — 1 is basically a shortcut name for STDOUT and 2 is basically a shortcut name for STDERR (0 is STDIN, although that’s not particularly relevant here). So 2>/dev/null says “take all of the STDERR stuff and redirect it to /dev/null”.

Sometimes you’ll see both STDERR and STDOUT being redirected either to a file or to /dev/null — in that case you will see 2>&1 where the ampersand prior to the “1” indicates the stream is being redirected to a file descriptor (2>1 would direct STDOUT to a file named “1”) — so >/dev/null 2>&1 is the normal way you’d see it written. Functionally, >/dev/null 1>&2 would be the same thing … but redirecting all output into error is, conceptually, a little odd.

To visualize all of this, use a command that will output something to both STDERR and STDOUT — for clarify, I’ve used “1>/dev/null” (redirect STDOUT to /devnull) in conjunction with 2>&1 (redirect STDERR to STDOUT). As written in the text above, the number 1 is generally omitted and just >/dev/null is written.