Category: System Administration

It’s Not A DNS Problem

I used to work at a company where everything was called an Exchange problem — not that Exchange 2000 didn’t have it’s share of problems (store.exe silent exit failures? Yes, that’s absolutely an Exchange problem) … but the majority of the time, the site had lost their connectivity back to the corporate data center. Or, when I’d see the network guys sprinting down the hallway as the first calls started to come in … the corporate data center had some sort of meltdown.

I’m reminded of this as I see people calling the Facebook outage a “DNS problem”. Facebook’s networks dropped out of BGP routing. That means there’s no route to their DNS server, so you get a resolution failure. It doesn’t mean there’s a DNS problem. Any more than it means there’s an IP or routing problem — I’m sure it’s all working as designed and either someone screwed up a config change or someone didn’t screw up and was trying to drop them off the Internet.

Saw much the same thing back when Egypt dropped off of the Internet back in 2011 — their routes were withdrawn from the routing tables. That’s an initiated process — maybe accidental, but it’s not the same as a bunch of devices losing power or a huge fiber cut.

And, when there’s no route you can use to get there … if DNS, web servers, databases, etc are working or not becomes moot.

Diffing two strings

Yes, I know md5sum has a “-c” option for checking the checksum in a file … but, if I was going to screw with a file, I’d have the good sense to edit the checksum file in the archive!

 

#!/bin/bash
#STRFILE=dead.letter
#STRCHECKMD5=5f3748d9c653b78c9ee7559acd423652
#STRMD5=`md5sum $STRFILE`

STRFILE=$1
STRCHECKMD5=$2

STRMD5=`md5sum $STRFILE`

diff -s <( printf '%s\n' "$STRCHECKMD5 $STRFILE" ) <( printf '%s\n' "$STRMD5" )

It’ll either output the file hash and the hash to match (a problem) or indicate the files are identical (a good thing)

DNF — What Provides This File?

“Dependency hell” used to be a big problem — you’d download one package, attempt to install it, and find out you needed three other packages. Download one of them, attempt to install it, and learn about five other packages. Fifty seven packages later, you forgot what you were trying to install in the first place and went home. Or, I suppose, you managed to install that first package and actually use it. The advent of repo-based deployments — where dependencies can be resolved and automatically downloaded — has mostly eliminated dependency hell. But, occasionally, you’ll have a manual install that says “oh, I cannot complete. I need libgdkglext-x11-1.0.so.0 or libminizip.so.1 … and, if there’s a package that’s named libgdkglext-x11 or libminizip … you’re good. There’s not. Fortunately, you can use “dnf provides” to search for a package that provides a specific file — thus learning that you need the gtkglext-libs and minizip-compat packages to resolve your dependencies.

Microsoft’s 1601 Time Base

Microsoft uses the number of 100-nanosecond intervals since 01 January 1601. Why? No idea. But I’ve had to deal with their funky large integer for a DateTime value as long as I’ve been working with AD. I’ve written functions to turn it int something useful, but that’s a lot of effort when I see a lockoutTime and need to know how recent that is. Enter w32tm which has an “ntte” switch — this allows me to readily tell that the lockout was at 3:01 today and something I need to be investigating.

Blocking Device Internet Access

We block Internet access for a lot of our smart devices. All of our control is done through the local server; and, short of updating firmware, the devices have no need to be chatting with the Internet. Unfortunately, our DSL modem/router does not have any sort of parental control, blocking, or filtering features. Fortunately, ISC DHCPD allows you to define per-host options. Setting the router to the device’s IP (0.0.0.0 may work as well) allows us to have devices that can communicate with anything on their subnet without allowing access out to other subnets or the Internet.

Viewing and recording packets using tshark

This time, I’m writing this down so I don’t have to keep looking it up. To display some packet info to the screen while writing a network capture to a file, include the -P option (older versions of tshark used -S)

2021-04-18 13:58:58 [lisa@server ~]# tshark -f "udp port 123" -w /tmp/ntpd.cap -P
Running as user "root" and group "root". This could be dangerous.
Capturing on 'enp0s25'
1 0.000000000 10.x.x.x → x.x.x.18 NTP 90 NTP Version 4, client
2 3.898916081 10.x.x.x → x.x.x.199 NTP 90 NTP Version 4, client
3 7.898948128 10.x.x.x → x.x.x.20 NTP 90 NTP Version 4, client
4 7.928749596 x.x.x.20 → 10.x.x.x NTP 90 NTP Version 4, server
5 9.898958577 10.x.x.x → x.x.x.76 NTP 90 NTP Version 4, client
6 9.949450324 x.x.x.76 → 10.x.x.x NTP 90 NTP Version 4, server
7 10.898981132 10.x.x.x → x.x.x.185 NTP 90 NTP Version 4, client
8 11.009163093 x.x.x.185 → 10.x.x.x NTP 90 NTP Version 4, server

Console access from virsh

I had a whole host of problems that were eventually resolved by rebooting the physical server … but, in the process of trying to figure out exactly what was wrong, I wanted to console into the virtual machines from the physical server. Using “virsh console vmname” should have worked … but it didn’t. Turns out you’ve got to enable a service on each guest before you’re able to console in from the physical server. To do so, run:

systemctl enable serial-getty@ttyS0.service

And, if you want to connect in *right now*, also start the service:

systemctl start serial-getty@ttyS0.service

Now, running “virsh console vmname” doesn’t appear to do much … but, if you hit the enter key, you’ll get a logon prompt for the VM.

Linux: Identifying Large Packages

The disk filled up on our primary server, and there wasn’t anything obvious like a decade worth of log files to clean up. I had to resort to uninstalling ‘stuff’ (it was, after all, installing ‘stuff’ that created the problem … tons of X11-related stuff for troubleshooting purposes). There is a way to list installed packages by size:

 

rpm -qia|awk '$1=="Name" { n=$3} $1=="Size" {s=$3} $1=="Description" {print s " " n }' |sort -n

MTU Probing

We’ve had a number of very strange network problems lately — Zoneminder cannot talk to cameras, clients veg out talking to Myth, Twonky is non-functional (even the web page — you get enough of the header to have a title, but the page just hangs, Scott cannot get to our Discourse site. And, more frustratingly, he cannot SSH to some of our hosts. Using “ssh -v” and throwing on a bunch of flags to not attempt key auth (-o PasswordAuthentication=yes -o PreferredAuthentications=keyboard-interactive,password -o PubkeyAuthentication=no) and his connection still hung. But, at least, I could see something. The last thing the SSH connection reported is:

debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

Which I’ve seen before … fortunately when I had a great Unix support guy working in the same office building that I did. Who let me stop over and bounce really oddball problems off of him. He told me to enable mtu probing.

echo 1 >/proc/sys/net/ipv4/tcp_mtu_probing

And, if that doesn’t work, use “echo 2”. Which …. yeah, wouldn’t have been any of my first thirty guesses. Cloudflare published a good article on what exactly MTU path discovery is, and I can RTFM enough to figure out what I’ve set here. But no idea what’s got a smaller MTU than our computers.

 

tcp_mtu_probing - INTEGER
	Controls TCP Packetization-Layer Path MTU Discovery.  
	  0 - Disabled
	  1 - Disabled by default, enabled when an ICMP black hole detected
	  2 - Always enabled, use initial MSS of tcp_base_mss.