Tag: Linux

Samba – Address family not supported by protocol

After upgrading to Fedora 39, we started having problems with Samba falling over on startup. The server has IPv6 disabled, and (evidently) something is not happy about that. I guess we could enable IPv6, but we don’t really need it.

Adding the following to lines to the GLOBAL section of the smb.conf file and restarting samba sorted it:

bind interfaces only = yes
interfaces = lo eth0

 

Feb 11 06:26:01 systemd[1]: Started smb.service – Samba SMB Daemon.
Feb 11 06:26:01 smbd[1109]: [2024/02/11 06:26:01.285076, 0] ../../source3/smbd/server.c:1091(smbd_open_one_socket)
Feb 11 06:26:01 smbd[1109]: smbd_open_one_socket: open_socket_in failed: Address family not supported by protocol
Feb 11 06:26:01 smbd[1109]: [2024/02/11 06:26:01.290022, 0] ../../source3/smbd/server.c:1091(smbd_open_one_socket)
Feb 11 06:26:01 smbd[1109]: smbd_open_one_socket: open_socket_in failed: Address family not supported by protocol
Feb 11 08:01:43 systemd[1]: Stopping smb.service – Samba SMB Daemon…
Feb 11 08:01:43 systemd[1]: smb.service: Deactivated successfully.
Feb 11 08:01:43 systemd[1]: Stopped smb.service – Samba SMB Daemon.

Updating Fedora — System Boots to Grub Error After Update

If you film the boot sequence and look frame by frame, you’ll see that it very briefly flashes a TPM error

error: ../../grub-core/commands/efi/tpm.c:150:unknown TPM error.

 

From what I’ve been able to glean, this secure boot stuff works off of signatures. Microsoft has signatures in BIOS. Everyone else kind of inserts their keys on the fly … so you can run out of space to save these keys and be unable to boot. To work around this, every time an update gets us over the limit, we go into the secure boot DBX management menu and reset the “Forbidden Signatures” from factory default. This is 13 keys instead of 373, and the OS is able to do it’s “thing” and boot.

 

And I’m actually writing this down this time because I had spent a lot of time researching this last time Scott’s laptop failed to boot and dumped out to a grub menu. This time, I kinda know what we did and why but lost a lot of the details.

RSync to Mirror Local Files

The rsync utility was meant to be used to sync files across the network — to or from an rsync server. For some time, I had a group of friends who shared documents off of my rsync server. Anyone with access could run an rsync command and sync their computer up with the group’s documents. With the advent of online file storage and collaborative editing, this was no longer needed. But I still use rsync to make sure my laptop has a local copy of a folder on the server. Mount /path/to/folder/contents/to/copy to the SMB or NFS share, and the following rsync command ensures the laptop’s /path/to/where/contents/should/be/placed has an exact mirror of the contents of the server folder

rsync –archive –verbose –update –delete “/path/to/folder/contents/to/copy/” “/path/to/where/contents/should/be/placed/”

–archive is a grouping of:
-r recursive
-l copy symlinks
-p preserve permissions
-t preserve modification timestamps
-g preserve group
-o preserve owner
–devices preserve device files (su only)
–specials preserve special files

ZFS Compression Ratio

We’ve got several PostgreSQL servers using ZFS file system for the database, and I needed to know how compressed the data is. Fortunately, there appears to be a zfs command that does exactly that: report the compression ratio for a zfs file system. Use zfs get compressratio /path/to/mount

Linux – High Load with CIFS Mounts using Kernel 6.5.5

We recently updated our Fedora servers from 36 and 37 to 38. Since the upgrade, we have observed servers with very high load averages – 8+ on a 4-cpu server – but the server didn’t seem unreasonably slow. On the Unix servers I first used, Irix and Solaris, load average counts threads in a Runnable state. Linux, however, includes both Runnable and Uninterruptible states in the load average. This means processes waiting – on network calls using mkdir to a mounted remote server, local disk I/O – are included in the load average. As such, a high load average on Linux may indicate CPU resource contention but it may also indicate I/O contention elsewhere.

But there’s a third possibility – code that opts for the simplicity of the uninterrupted sleep without needing to be uninterruptible for a process. In our upgrade, CIFS mounts have a laundromat that I assume cleans up cache – I see four cifsd-cfid-laundromat threads in an uninterruptible sleep state – which means my load average, when the system is doing absolutely nothing, would be 4.

2023-10-03 11:11:12 [lisa@server01 ~/]# ps aux | grep " [RD]"
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1150 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1151 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1152 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1153 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 556598 0.0 0.0 224668 3072 pts/11 R+ 11:11 0:00 ps aux

Looking around the Internet, I see quite a few bug reports regarding this situation … so it seems like a “ignore it and wait” problem – although the load average value is increased by these sleeping threads, it’s cosmetic. Which explains why the server didn’t seem to be running slowly even through the load average was so high.

https://lkml.org/lkml/2023/9/26/1144

Date: Tue, 26 Sep 2023 17:54:10 -0700
From: Paul Aurich 
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On 2023-09-19 13:23:44 -0500, Steve French wrote:
>On Tue, Sep 19, 2023 at 1:07 PM Tom Talpey <tom@talpey.com> wrote:
>> These changes are good, but I'm skeptical they will reduce the load
>> when the laundromat thread is actually running. All these do is avoid
>> creating it when not necessary, right?
>
>It does create half as many laundromat threads (we don't need
>laundromat on connection to IPC$) even for the Windows server target
>example, but helps more for cases where server doesn't support
>directory leases.

Perhaps the laundromat thread should be using msleep_interruptible()?

Using an interruptible sleep appears to prevent the thread from contributing
to the load average, and has the happy side-effect of removing the up-to-1s delay
when tearing down the tcon (since a7c01fa93ae, kthread_stop() will return
early triggered by kthread_stop).

~Paul

 

Maintaining an /etc/hosts record

I encountered an oddity at work — there’s a server on an internally located public IP space. Because it’s public space, it is not allowed to communicate with the internal interface of some of our security group’s servers. It has to use their public interface (not technically, just a policy on which they will not budge). I cannot just use a DNS server that resolves the public copy of our zone because then we’d lose access to everything else, so we are stuck making an /etc/hosts entry. Except this thing changes IPs fairly regularly (hey, we’re moving from AWS to Azure; hey, let’s try CloudFlare; nope, that is expensive so change it back) and the service it provides is application authentication so not something you want randomly falling over every couple of months.

So I’ve come up with a quick script to maintain the /etc/hosts record for the endpoint.

# requires: dnspython, subprocess

import dns.resolver
import subprocess

strHostToCheck = 'hostname.example.com' # PingID endpoint for authentication
strDNSServer = "8.8.8.8"         # Google's public DNS server
listStrIPs = []

# Get current assignement from hosts file
listCurrentAssignment = [ line for line in open('/etc/hosts') if strHostToCheck in line]

if len(listCurrentAssignment) >= 1:
        strCurrentAssignment = listCurrentAssignment[0].split("\t")[0]

        # Get actual assignment from DNS
        objResolver = dns.resolver.Resolver()
        objResolver.nameservers = [strDNSServer]
        objHostResolution = objResolver.query(strHostToCheck)

        for objARecord in objHostResolution:
                listStrIPs.append(objARecord.to_text())

        if len(listStrIPs) >= 1:
                # Fix /etc/hosts if the assignment there doesn't match DNS
                if strCurrentAssignment in listStrIPs:
                        print(f"Nothing to do -- hosts file record {strCurrentAssignment} is in {listStrIPs}")
                else:
                        print(f"I do not find {strCurrentAssignment} here, so now fix it!")
                        subprocess.call([f"sed -i -e 's/{strCurrentAssignment}\t{strHostToCheck}/{listStrIPs[0]}\t{strHostToCheck}/g' /etc/hosts"], shell=True)
        else:
                print("No resolution from DNS ... that's not great")
else:
        print("No assignment found in /etc/hosts ... that's not great either")

Mounting DD Raw Image File

And a final note from my disaster recovery adventure — I had to use ddrescue to copy as much data from a corrupted drive as possible (ddrescue /dev/sdb /mnt/vms/rescue/backup.raw –try-again –force –verbose) — once I had the image, what do you do with it? Fortunately, you can mount a dd file and copy data from it.

# Mounting DD image
2023-04-17 23:54:01 [root@fedora /]# kpartx -l backup.raw
loop0p1 : 0 716800 /dev/loop0 2048
loop0p2 : 0 438835200 /dev/loop0 718848

2023-04-17 23:55:08 [root@fedora /]# mount /dev/mapper/loop0p2 /mnt/recovery/ -o loop,ro
mount: /mnt/recovery: cannot mount /dev/loop1 read-only.
       dmesg(1) may have more information after failed mount system call.

2023-04-17 23:55:10 [root@fedora /]# mount /dev/mapper/loop0p2 /mnt/recovery/ -o loop,ro,norecovery

2023-04-18 00:01:03 [root@fedora /]# ll /mnt/recovery/
total 205G
drwxr-xr-x  2 root root  213 Jul 14  2021 .
drwxr-xr-x. 8 root root  123 Apr 17 22:38 ..
-rw-r--r--. 1 root root 127G Apr 17 20:35 ExchangeServer.qcow2
-rw-r--r--. 1 qemu qemu  10G Apr 17 21:42 Fedora.qcow2
-rw-r--r--. 1 qemu qemu  15G Apr 17 14:05 FedoraVarMountPoint.qcow2