Category: Coding

Python: Listing XML tags

I was having a lot of trouble using find/findall when parsing an XML document — turns out the namespace prefixed the tag name … so I needed to find {http://maven.apache.org/POM/4.0.0}groupId instead of just groupId

How do you figure that out? Quickest way, for me, was just to print out all of the tag names.

from lxml import etree
# Load POM XML into tree
tree = etree.parse( strXMLFile )

# # List all element names in XML document
for element in tree.iter():
     print(element.tag)

Python: Generate Transcript of Video File

There’s a speech_recognition module in Python that transcribes an audio file — since ffmpeg can convert a video file to mp3, that means you can also use Python to transcribe a video file.

# requires pocketsphinx from CMU if using sphinx for speech to text recognition
import os
import speech_recognition as sr
import ffmpeg

strFFMPEGBinaryLocation = 'c:/tmp/ffmpeg/bin/ffmpeg.exe'
strCurrentDirectory = os.getcwd()

strInputVideo = "\"Z:/Path To/My Video/file.MP4\""
strOutputFileName = "converted.wav"
# Convert mp4 to wav file
strffmpeg_convert_mp4_to_wav = f'{strFFMPEGBinaryLocation} -i {strInputVideo} {strCurrentDirectory}/{strOutputFileName}'
os.system(strffmpeg_convert_mp4_to_wav)

# Run converted wav file through speech recognizer
r = sr.Recognizer()
audio = sr.AudioFile(f'{strCurrentDirectory}/{strOutputFileName}')

with audio as source:
	#audio = r.record(source, 90)				# Would need API key to process longer audio?
	#text = r.recognize_google(audio)
	audio = r.record(source)
	text = r.recognize_sphinx(audio)
print(text)

Python Code — Creating Title Images

Instead of allowing YouTube to randomly pick a frame to use as the preview image, I have always made a title image for the Township meetings I post to YouTube. At first, this was a manual (and thus time consuming for a lot of videos). In the interim, I have created a script that generates the color gradient background and overlays text including the meeting type and date.

# Valid meeting types: "TrusteeRegular",  "TrusteeSpecial", "TrusteeEmer", "TrusteeHearing", "BZAReg", "BZAHearing", "ZCReg", "ZCHearing"
strMeetingListSpreadsheet = 'MeetingList.xlsx'

from PIL import Image, ImageDraw, ImageFont
import pandas as pd

BLACK= (0,0,0)
WHITE = (255,255,255)

TRUSTEE_COLOR_PALETTE = [(156,12,12), (92,7,7), (0,0,0)]
BZA_COLOR_PALETTE = [(253,139,1), (91,51,0), (0,0,0)]
ZC_COLOR_PALETTE = [(24,113,56), (8,41,20), (0,0,0)]
MISC_COLOR_PALETTE = [(175,28,195), (55,9,61), (0,0,0)]

objFontMeetingTitle = ImageFont.truetype("/usr/share/fonts/liberation-sans/LiberationSans-Regular.ttf",115)
objFontMeetingTopic = ImageFont.truetype("/usr/share/fonts/liberation-sans/LiberationSans-Regular.ttf",115)
objFontMeetingDate = ImageFont.truetype("/usr/share/fonts/liberation-sans/LiberationSans-Italic.ttf",95)

class Point(object):
    def __init__(self, x, y):
        self.x, self.y = x, y

class Rect(object):
    def __init__(self, x1, y1, x2, y2):
        minx, maxx = (x1,x2) if x1 < x2 else (x2,x1)
        miny, maxy = (y1,y2) if y1 < y2 else (y2,y1)
        self.min = Point(minx, miny)
        self.max = Point(maxx, maxy)

    width  = property(lambda self: self.max.x - self.min.x)
    height = property(lambda self: self.max.y - self.min.y)

def gradient_color(minval, maxval, val, color_palette):
    """ Computes intermediate RGB color of a value in the range of minval
        to maxval (inclusive) based on a color_palette representing the range.
    """
    max_index = len(color_palette)-1
    delta = maxval - minval
    if delta == 0:
        delta = 1
    v = float(val-minval) / delta * max_index
    i1, i2 = int(v), min(int(v)+1, max_index)
    (r1, g1, b1), (r2, g2, b2) = color_palette[i1], color_palette[i2]
    f = v - i1
    return int(r1 + f*(r2-r1)), int(g1 + f*(g2-g1)), int(b1 + f*(b2-b1))

def horz_gradient(draw, rect, color_func, color_palette):
    minval, maxval = 1, len(color_palette)
    delta = maxval - minval
    width = float(rect.width)  # Cache.
    for x in range(rect.min.x, rect.max.x+1):
        f = (x - rect.min.x) / width
        val = minval + f * delta
        color = color_func(minval, maxval, val, color_palette)
        draw.line([(x, rect.min.y), (x, rect.max.y)], fill=color)

def vert_gradient(draw, rect, color_func, color_palette):
    minval, maxval = 1, len(color_palette)
    delta = maxval - minval
    height = float(rect.height)  # Cache.
    for y in range(rect.min.y, rect.max.y+1):
        f = (y - rect.min.y) / height
        val = minval + f * delta
        color = color_func(minval, maxval, val, color_palette)
        draw.line([(rect.min.x, y), (rect.max.x, y)], fill=color)


if __name__ == '__main__':
    df = pd.read_excel(strMeetingListSpreadsheet, sheet_name="Sheet1")

    df = df.reset_index()  # make sure indexes pair with number of rows

    for index, row in df.iterrows():
        strGraphicName = f"{row['Date'].strftime('%Y%d%m')}-{row['Type']}.png"
        strMeetingType = row['Type']

        # Draw a three color horizontal gradient.
        region = Rect(0, 0, 1920, 1080)
        width, height = region.max.x+1, region.max.y+1
        image = Image.new("RGB", (width, height), BLACK)
        draw = ImageDraw.Draw(image)

        # Add meeting title
        if strMeetingType == "TrusteeRegular":
            horz_gradient(draw, region, gradient_color, TRUSTEE_COLOR_PALETTE)
            draw.text((1670, 525),"Trustee Regular Meeting",WHITE,font=objFontMeetingTopic, anchor="rm")
        elif strMeetingType == "TrusteeSpecial":
            horz_gradient(draw, region, gradient_color, TRUSTEE_COLOR_PALETTE)
            draw.text((1670, 525),"Trustee Special Meeting",WHITE,font=objFontMeetingTopic, anchor="rm")
        elif strMeetingType == "TrusteeEmer":
            horz_gradient(draw, region, gradient_color, TRUSTEE_COLOR_PALETTE)
            draw.text((1670, 525),"Trustee Emergency Meeting",WHITE,font=objFontMeetingTopic, anchor="rm")
        elif strMeetingType == "TrusteeHearing":
            horz_gradient(draw, region, gradient_color, TRUSTEE_COLOR_PALETTE)
            draw.text((1670, 525),"Trustee Public Hearing",WHITE,font=objFontMeetingTopic, anchor="rm")
        elif strMeetingType == "BZAReg":
            horz_gradient(draw, region, gradient_color, BZA_COLOR_PALETTE)
            draw.text((1670, 525),"BZA Regular Meeting",WHITE,font=objFontMeetingTopic, anchor="rm")
        elif strMeetingType == "BZAHearing":
            horz_gradient(draw, region, gradient_color, BZA_COLOR_PALETTE)
            draw.text((1670, 525),"BZA Public Hearing",WHITE,font=objFontMeetingTopic, anchor="rm")
        elif strMeetingType == "ZCReg":
            horz_gradient(draw, region, gradient_color, ZC_COLOR_PALETTE)
            draw.text((1670, 525),"Zoning Commission Meeting",WHITE,font=objFontMeetingTopic, anchor="rm")
        elif strMeetingType == "ZCHearing":
            horz_gradient(draw, region, gradient_color, ZC_COLOR_PALETTE)
            draw.text((1670, 525),"Zoning Commission Hearing",WHITE,font=objFontMeetingTopic, anchor="rm")
        else:
            horz_gradient(draw, region, gradient_color, MISC_COLOR_PALETTE)
            draw.text((1670, 525),"Township Meeting",WHITE,font=objFontMeetingTopic, anchor="rm")

        # Add township and date
        draw.text((1070, 225),"Hinckley Township",WHITE,font=objFontMeetingTitle, anchor="rm")
        draw.text((1770, 825),row['Date'].strftime('%B %d, %Y'),WHITE,font=objFontMeetingDate, anchor="rm")

        image.save(strGraphicName, "PNG")
        print(f"image saved as {strGraphicName}")


I have an Excel file which contains the meeting type code, a long meeting title that is used as the second line of the image, a date (and a MeetingDate that I use in my concat formulae that create the title and description for YouTube). To use an Excel date in concat, you need to use a TEXT formula with the text formatting string.

This allows me to have a consistent preview image for all of our postings without actually making dozens of files by hand.

DIFF’ing JSON

While a locally processed web tool like https://github.com/zgrossbart/jdd can be used to identify differences between two JSON files, regular diff can be used from the command line for simple comparisons. Using jq to sort JSON keys, diff will highlight (pipe bars between the two columns, in this example) where differences appear between two JSON files. Since they keys are sorted, content order doesn’t matter much — it’s possible you’d have a list element 1,2,3 in one and 2,1,3 in another, which wouldn’t be sorted.

[lisa@fedorahost ~]# diff -y <(jq --sort-keys . 1.json) <(jq --sort-keys . 2.json )
{                                                               {
  "glossary": {                                                   "glossary": {
    "GlossDiv": {                                                   "GlossDiv": {
      "GlossList": {                                                  "GlossList": {
        "GlossEntry": {                                                 "GlossEntry": {
          "Abbrev": "ISO 8879:1986",                                      "Abbrev": "ISO 8879:1986",
          "Acronym": "SGML",                                  |           "Acronym": "XGML",
          "GlossDef": {                                                   "GlossDef": {
            "GlossSeeAlso": [                                               "GlossSeeAlso": [
              "GML",                                                          "GML",
              "XML"                                                           "XML"
            ],                                                              ],
            "para": "A meta-markup language, used to create m               "para": "A meta-markup language, used to create m
          },                                                              },
          "GlossSee": "markup",                                           "GlossSee": "markup",
          "GlossTerm": "Standard Generalized Markup Language"             "GlossTerm": "Standard Generalized Markup Language"
          "ID": "SGML",                                                   "ID": "SGML",
          "SortAs": "SGML"                                    |           "SortAs": "XGML"
        }                                                               }
      },                                                              },
      "title": "S"                                                    "title": "S"
    },                                                              },
    "title": "example glossary"                                     "title": "example glossary"
  }                                                               }
}                                                               }

Bulk Download of YouTube Videos from Channel

Several years ago, I started recording our Township meetings and posting them to YouTube. This was very helpful — even our government officials used the recordings to refresh their memory about what happened in a meeting. But it also led people to ask “why, exactly, are we relying on some random citizen to provide this service? What if they are busy? Or move?!” … and the Township created their own channel and posted their meeting recordings. This was a great way to promote transparency however they’ve got retention policies. Since we have absolutely been at meetings where it would be very helpful to know what happened five, ten, forty!! years ago … my expectation is that these videos will be useful far beyond the allotted document retention period.

We decided to keep our channel around with the historic archive of government meeting recordings. There’s no longer time criticality — anyone who wants to see a current meeting can just use the township’s channel. We have a script that lists all of the videos from the township’s channel and downloads them — once I complete back-filling our archive, I will modify the script to stop once it reaches a video series we already have. But this quick script will list all videos published to a channel and download the highest quality MP4 file associated with that video.

# API key for my Google Developer project
strAPIKey = '<CHANGEIT>'

# Youtube account channel ID
strChannelID = '<CHANGEIT>'

import os
from time import sleep
import urllib
from urllib.request import urlopen
import json
from pytube import YouTube
import datetime

from config import dateLastDownloaded

os.chdir(os.path.dirname(os.path.abspath(__file__)))
print(os.getcwd())

strBaseVideoURL = 'https://www.youtube.com/watch?v='
strSearchAPIv3URL= 'https://www.googleapis.com/youtube/v3/search?'

iStart = 0		# Not used -- included to allow skipping first N files when batch fails midway
iProcessed = 0		# Just a counter

strStartURL = f"{strSearchAPIv3URL}key={strAPIKey}&channelId={strChannelID}&part=snippet,id&order=date&maxResults=50"
strYoutubeURL = strStartURL

while True:
    inp = urllib.request.urlopen(strYoutubeURL)
    resp = json.load(inp)

    for i in resp['items']:
        if i['id']['kind'] == "youtube#video":
            iDaysSinceLastDownload = datetime.datetime.strptime(i['snippet']['publishTime'], "%Y-%m-%dT%H:%M:%SZ") - dateLastDownloaded
            # If video was posted since last run time, download the video
            if iDaysSinceLastDownload.days >= 0:
                strFileName = (i['snippet']['title']).replace('/','-').replace(' ','_')
                print(f"{iProcessed}\tDownloading file {strFileName} from {strBaseVideoURL}{i['id']['videoId']}")
                # Need to retrieve a youtube object and filter for the *highest* resolution otherwise we get blurry videos
                if iProcessed >= iStart:
                    yt = YouTube(f"{strBaseVideoURL}{i['id']['videoId']}")
                    yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first().download(filename=f"{strFileName}.mp4")
                    sleep(90)
                iProcessed = iProcessed + 1
    try:
        next_page_token = resp['nextPageToken']
        strYoutubeURL = strStartURL + '&pageToken={}'.format(next_page_token)
        print(f"Now getting next page from {strYoutubeURL}")
    except:
        break

# Update config.py with last run date
f = open("config.py","w")
f.write("import datetime\n")
f.write(f"dateLastDownloaded = datetime.datetime({datetime.datetime.now().year},{datetime.datetime.now().month},{datetime.datetime.now().day},0,0,0)")
f.close

Maintaining an /etc/hosts record

I encountered an oddity at work — there’s a server on an internally located public IP space. Because it’s public space, it is not allowed to communicate with the internal interface of some of our security group’s servers. It has to use their public interface (not technically, just a policy on which they will not budge). I cannot just use a DNS server that resolves the public copy of our zone because then we’d lose access to everything else, so we are stuck making an /etc/hosts entry. Except this thing changes IPs fairly regularly (hey, we’re moving from AWS to Azure; hey, let’s try CloudFlare; nope, that is expensive so change it back) and the service it provides is application authentication so not something you want randomly falling over every couple of months.

So I’ve come up with a quick script to maintain the /etc/hosts record for the endpoint.

# requires: dnspython, subprocess

import dns.resolver
import subprocess

strHostToCheck = 'hostname.example.com' # PingID endpoint for authentication
strDNSServer = "8.8.8.8"         # Google's public DNS server
listStrIPs = []

# Get current assignement from hosts file
listCurrentAssignment = [ line for line in open('/etc/hosts') if strHostToCheck in line]

if len(listCurrentAssignment) >= 1:
        strCurrentAssignment = listCurrentAssignment[0].split("\t")[0]

        # Get actual assignment from DNS
        objResolver = dns.resolver.Resolver()
        objResolver.nameservers = [strDNSServer]
        objHostResolution = objResolver.query(strHostToCheck)

        for objARecord in objHostResolution:
                listStrIPs.append(objARecord.to_text())

        if len(listStrIPs) >= 1:
                # Fix /etc/hosts if the assignment there doesn't match DNS
                if strCurrentAssignment in listStrIPs:
                        print(f"Nothing to do -- hosts file record {strCurrentAssignment} is in {listStrIPs}")
                else:
                        print(f"I do not find {strCurrentAssignment} here, so now fix it!")
                        subprocess.call([f"sed -i -e 's/{strCurrentAssignment}\t{strHostToCheck}/{listStrIPs[0]}\t{strHostToCheck}/g' /etc/hosts"], shell=True)
        else:
                print("No resolution from DNS ... that's not great")
else:
        print("No assignment found in /etc/hosts ... that's not great either")

Web Redirection Based on Typed URL

I have no idea why I am so pleased with this simple HTML code, but I am! My current project is to move all of our Tableau servers to different servers running a newer version of Windows. When I first got involved with the project, it seemed rather difficult (there was talk of manually recreating all of the permissions on each item!!) … but, some review of the vendors documentation let me to believe one could build a same-version server elsewhere (newer Windows, out in magic cloudy land, but the same Tableau version), back up the data from the old server, restore it to the new one, and be done. It’s not quite that simple — I had to clear out the SAML config & manually reconfigure it so the right elements get added into the Java keystore, access to the local Postgresql database needed to be manually configured, a whole bunch of database drivers needed to be installed, and the Windows registry of ODBC connections needed to be exported/imported. But the whole process was a lot easier than what I was first presented.

Upgrading the first production server was mostly seamless — except users appear to have had the server’s actual name. Instead of accessing https://tableau.example.com, they were typing abcwxy129.example.com. And passing that link around as “the link” to their dashboard. And, upon stopping the Tableau services on the old server … those links started to fail. Now, I could have just CNAMED abcwxy129 over to tableau and left it at that. But letting users continue to do the wrong thing always seems to come back and haunt you (if nothing else, the OS folks own the namespace of servers & are allowed to re-use or delete those hostnames at will). So I wanted something that would take whatever https://tableau.example.com/#/site/DepartMent/workbooks/3851/Views kind of URL a user provided and give them the right address. And, since this was Windows, to do so with IIS without the hassle of integrating PHP or building a C# project. Basically, I wanted to do it within basic HTML. Which meant JavaScript.

And I did it — using such a basic IIS installation that the file is named something like iisstart.htm so I didn’t have to change the default page name. I also redirected 404 to / so any path under the old server’s name will return the redirection landing page.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
		<title>This Tableau server has moved</title>
		
		<style type="text/css">
			<!--
			body {
				color:#000000;
				background-color:#eeeeee;
				margin:0;
			}
			-->
		</style>
	</head>
	<body>
		<P><ul>
		<h2>The Tableau server has moved. </h2>
		<P>The location you accessed, <span style="white-space: nowrap" id="oldurl"></span>, is no longer available.<br><br> Please update your link to use <span style="white-space: nowrap"  id="newurl"></span></p>
		</ul></P>
	
		<script>
			let strOldURL = window.location.href;

			let strNewURL = strOldURL.replace(/hostname.example.com/i,"tableau.example.com");
			strNewURL = strNewURL.replace(/otherhostname.example.com/i,"tableau.example.com");

			document.getElementById("oldurl").innerHTML = window.location.href;
			document.getElementById("newurl").innerHTML = "<a href=" + strNewURL + ">" + strNewURL + "</a>";
		</script>
	
	</body>
</html>

Quick Python Bingo Caller

I keep re-writing the same quick script to implement a bingo “caller” so Anya and I can play a game … figured I’d save it somewhere and save a few minutes next time! We use more words than squares so not every word is on both boards, but you can shorten the list to 24 and just put the words in different squares on each board.

import random
  
# initializing the word list -- 24 words for the 24 squares but we play with more words than squares!
wordList = ["Hypothesis", "Observation", "Theory", "Variable", "Cat"
, "Fun", "Science", "Happy", "Dog", "Thyme"
, "Rosemary", "Sage", "Time", "Run", "Pot"
, "TV", "Rogue", "Smile", "Black", "Rock"
, "Ash", "Kitten", "Love", "Bingo (but not BINGO like somebody won!)",
"Mom", "Dad", "Anya", "Wood", "Trail", "Tail", "Star"]

# shuffling word list
random.shuffle(wordList)

i = 0  
while i < len(wordList):
    print(wordList[i])
    i += 1
    x= input()

2>/dev/null

A few times now, I’ve encountered individuals with cron jobs or bash scripts where a command execution ends in 2>/dev/null … and the individual is stymied by the fact it’s not working but there’s no clue as to why. The error output is being sent into a big black hole never to escape!

The trick here is to understand file descriptors — 1 is basically a shortcut name for STDOUT and 2 is basically a shortcut name for STDERR (0 is STDIN, although that’s not particularly relevant here).  So 2>/dev/null says “take all of the STDERR stuff and redirect it to /dev/null”.

Sometimes you’ll see both STDERR and STDOUT being redirected either to a file or to /dev/null — in that case you will see 2>&1 where the ampersand prior to the “1” indicates the stream is being redirected to a file descriptor (2>1 would direct STDOUT to a file named “1”) — so >/dev/null 2>&1 is the normal way you’d see it written. Functionally, >/dev/null 1>&2 would be the same thing … but redirecting all output into error is, conceptually, a little odd.

To visualize all of this, use a command that will output something to both STDERR and STDOUT — for clarify, I’ve used “1>/dev/null” (redirect STDOUT to /devnull) in conjunction with 2>&1 (redirect STDERR to STDOUT). As written in the text above, the number 1 is generally omitted and just >/dev/null is written.

 

 

SSL Connection Failure from Docker Image

We have a script that’s used to securely retrieve passwords … a script which failed when run from a Docker container.

* could not load PEM client certificate, OpenSSL error error:140AB18E:SSL routines:SSL_CTX_use_certificate:ca md too weak, (no key found, wrong pass phrase, or wrong file format?)

Appears root of issue is tied to Debian OS that’s used in the python:3.7-slim container that’s being used. Newer iterations of some Linux OS’s have a default setting in the openssl config that provide a setting for SSL_CTX_set_security_level that precludes communication with password server.

Remediating this at the server end is not a reasonable approach, so client config needs to be changed to allow connection to be established. Setting security level to 1 allows connection to proceed, so proposed including additional instruction in Dockerfile that uses sed to update the configuration parameter.

sed -i 's/DEFAULT@SECLEVEL=2/DEFAULT@SECLEVEL=1/' /etc/ssl/openssl.cnf

Once that setting was updated, the script worked perfectly as it does on our physical and VM servers.