Category: System Administration

Updating Fedora — System Boots to Grub Error After Update

If you film the boot sequence and look frame by frame, you’ll see that it very briefly flashes a TPM error

error: ../../grub-core/commands/efi/tpm.c:150:unknown TPM error.

 

From what I’ve been able to glean, this secure boot stuff works off of signatures. Microsoft has signatures in BIOS. Everyone else kind of inserts their keys on the fly … so you can run out of space to save these keys and be unable to boot. To work around this, every time an update gets us over the limit, we go into the secure boot DBX management menu and reset the “Forbidden Signatures” from factory default. This is 13 keys instead of 373, and the OS is able to do it’s “thing” and boot.

 

And I’m actually writing this down this time because I had spent a lot of time researching this last time Scott’s laptop failed to boot and dumped out to a grub menu. This time, I kinda know what we did and why but lost a lot of the details.

RSync to Mirror Local Files

The rsync utility was meant to be used to sync files across the network — to or from an rsync server. For some time, I had a group of friends who shared documents off of my rsync server. Anyone with access could run an rsync command and sync their computer up with the group’s documents. With the advent of online file storage and collaborative editing, this was no longer needed. But I still use rsync to make sure my laptop has a local copy of a folder on the server. Mount /path/to/folder/contents/to/copy to the SMB or NFS share, and the following rsync command ensures the laptop’s /path/to/where/contents/should/be/placed has an exact mirror of the contents of the server folder

rsync –archive –verbose –update –delete “/path/to/folder/contents/to/copy/” “/path/to/where/contents/should/be/placed/”

–archive is a grouping of:
-r recursive
-l copy symlinks
-p preserve permissions
-t preserve modification timestamps
-g preserve group
-o preserve owner
–devices preserve device files (su only)
–specials preserve special files

PowerShell: Mass Active Directory Password Changes

We have a bunch of accounts that function as extra mailboxes — all conveniently housed in on OU. The following PowerShell command sets the password for all of the accounts in one go. Not terribly useful for “real world” use … but useful for testing (and probably something I’ll end up using again)

$OUpath = ‘ou=Mail Aliases,dc=example,dc=com’
$strNewPassword = “What3v3rYu0W@nt1tT0B3”

Get-ADUser -Filter * -SearchBase $OUpath | Set-ADAccountPassword -Reset -NewPassword (ConvertTo-SecureString -AsPlainText $strNewPassword -Force)

ZFS Compression Ratio

We’ve got several PostgreSQL servers using ZFS file system for the database, and I needed to know how compressed the data is. Fortunately, there appears to be a zfs command that does exactly that: report the compression ratio for a zfs file system. Use zfs get compressratio /path/to/mount

Web Proxy Auto Discovery (WPAD) DNS Failure

I wanted to set up automatic proxy discovery on our home network — but it just didn’t work. The website is there, it looks fine … but it doesn’t work. Turns out Microsoft introduced some security idea in Windows 2008 that prevents Windows DNS servers from serving specific names. They “banned” Web Proxy Auto Discovery (WPAD) and Intra-site Automatic Tunnel Addressing Protocol (ISATAP). Even if you’ve got a valid wpad.example.com host recorded in your domain, Windows DNS server says “Nope, no such thing!”. I guess I can appreciate the logic — some malicious actor can hijack all of your connections by tunnelling or proxying your traffic. But … doesn’t the fact I bothered to manually create a hostname kind of clue you into the fact I am trying to do this?!?

I gave up and added the proxy config to my group policy — a few computers, then, needed to be manually configured. It worked. Looking in the event log for a completely different problem, I saw the following entry:

Event ID 6268

The global query block list is a feature that prevents attacks on your  network by blocking DNS queries for specific host names. This feature has caused the DNS server to fail a query with error code NAME ERROR for wpad.example.com. even though data for this DNS name exists in the DNS database. Other queries in all locally authoritative zones for other names
that begin with labels in the block list will also fail, but no event will be logged when further queries are blocked until the DNS server service on this computer is restarted. See product documentation for information about this feature and instructions on how to configure it.

The oddest bit is that this appears to be a substring ‘starts with’ query — like wpadlet or wpadding would also fail? A quick search produced documentation on this Global Query Blocklist … and two quick ways to resolve the issue.

(1) Change the block list to contain only the services you don’t want to use. I don’t use ISATAP, so blocking isatap* hostnames isn’t problematic:

dnscmd /config /globalqueryblocklist isatap

View the current blocklist with:

dnscmd /info /globalqueryblocklist

– Or –

(2) Disable the block list — more risk, but it avoids having to figure this all out again in a few years when a hostname starting with isatap doesn’t work for no reason!

dnscmd /config /enableglobalqueryblocklist 0

 

DIFF’ing JSON

While a locally processed web tool like https://github.com/zgrossbart/jdd can be used to identify differences between two JSON files, regular diff can be used from the command line for simple comparisons. Using jq to sort JSON keys, diff will highlight (pipe bars between the two columns, in this example) where differences appear between two JSON files. Since they keys are sorted, content order doesn’t matter much — it’s possible you’d have a list element 1,2,3 in one and 2,1,3 in another, which wouldn’t be sorted.

[lisa@fedorahost ~]# diff -y <(jq --sort-keys . 1.json) <(jq --sort-keys . 2.json )
{                                                               {
  "glossary": {                                                   "glossary": {
    "GlossDiv": {                                                   "GlossDiv": {
      "GlossList": {                                                  "GlossList": {
        "GlossEntry": {                                                 "GlossEntry": {
          "Abbrev": "ISO 8879:1986",                                      "Abbrev": "ISO 8879:1986",
          "Acronym": "SGML",                                  |           "Acronym": "XGML",
          "GlossDef": {                                                   "GlossDef": {
            "GlossSeeAlso": [                                               "GlossSeeAlso": [
              "GML",                                                          "GML",
              "XML"                                                           "XML"
            ],                                                              ],
            "para": "A meta-markup language, used to create m               "para": "A meta-markup language, used to create m
          },                                                              },
          "GlossSee": "markup",                                           "GlossSee": "markup",
          "GlossTerm": "Standard Generalized Markup Language"             "GlossTerm": "Standard Generalized Markup Language"
          "ID": "SGML",                                                   "ID": "SGML",
          "SortAs": "SGML"                                    |           "SortAs": "XGML"
        }                                                               }
      },                                                              },
      "title": "S"                                                    "title": "S"
    },                                                              },
    "title": "example glossary"                                     "title": "example glossary"
  }                                                               }
}                                                               }

Tableau: Upgrading from 2022.3.x to 2023.3.0

A.K.A. I upgraded and now my site has no content?!? Attempting to test the upgrade to 2023.3.0 in our development environment, the site was absolutely empty after the upgrade completed. No errors, nothing indicating something went wrong. Just nothing in the web page where I would expect to see data sources, workbooks, etc. The database still had a lot of ‘stuff’, the disk still had hundreds of gigs of ‘stuff’. But nothing showed up. I have experienced this problem starting with 2022.3.5 or 2022.3.11 and upgrading to 2023.3.0. I could upgrade to 2023.1.x and still have site content.

I wasn’t doing anything peculiar during the upgrade:

  • Run TableauServerTabcmd-64bit-2023-3-0.exe to upgrade the CLI
  • Run TableauServer-64bit-2023-3-0.exe to upgrade the Tableau binaries
  • Once installation completes, run open a ​new​ command prompt with ​Run as Administrator and launch “.\Tableau\Tableau Server\packages\scripts.20233.23.1017.0948\upgrade-tsm.cmd” –username username

The upgrade-tsm batch upgrades all of the components and database content. At this point, the server will be stopped. Start it. Verify everything looks OK – site is online, SSL is right, I can log in. Check out the site data … it’s not there!

Reportedly this is a known bug that only impacts systems that have been restored from backup. Since all of my servers were moved from Windows 2012 to Windows 2019 by backing up and restoring the Tableau sites … that’d be all of ’em! Fortunately, it is easy enough to make the data visible again. Run tsm maintenance reindex-search to recreate the search index. Refresh the user site, and there will be workbooks, data, jobs, and all sorts of things.

If reindexing does not sort the problem, tsm maintenance reset-searchserver should do it. The search reindex sorted me, though.

Linux – High Load with CIFS Mounts using Kernel 6.5.5

We recently updated our Fedora servers from 36 and 37 to 38. Since the upgrade, we have observed servers with very high load averages – 8+ on a 4-cpu server – but the server didn’t seem unreasonably slow. On the Unix servers I first used, Irix and Solaris, load average counts threads in a Runnable state. Linux, however, includes both Runnable and Uninterruptible states in the load average. This means processes waiting – on network calls using mkdir to a mounted remote server, local disk I/O – are included in the load average. As such, a high load average on Linux may indicate CPU resource contention but it may also indicate I/O contention elsewhere.

But there’s a third possibility – code that opts for the simplicity of the uninterrupted sleep without needing to be uninterruptible for a process. In our upgrade, CIFS mounts have a laundromat that I assume cleans up cache – I see four cifsd-cfid-laundromat threads in an uninterruptible sleep state – which means my load average, when the system is doing absolutely nothing, would be 4.

2023-10-03 11:11:12 [lisa@server01 ~/]# ps aux | grep " [RD]"
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1150 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1151 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1152 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 1153 0.0 0.0 0 0 ? D Sep28 0:01 [cifsd-cfid-laundromat]
root 556598 0.0 0.0 224668 3072 pts/11 R+ 11:11 0:00 ps aux

Looking around the Internet, I see quite a few bug reports regarding this situation … so it seems like a “ignore it and wait” problem – although the load average value is increased by these sleeping threads, it’s cosmetic. Which explains why the server didn’t seem to be running slowly even through the load average was so high.

https://lkml.org/lkml/2023/9/26/1144

Date: Tue, 26 Sep 2023 17:54:10 -0700
From: Paul Aurich 
Subject: Re: Possible bug report: kernel 6.5.0/6.5.1 high load when CIFS share is mounted (cifsd-cfid-laundromat in"D" state)

On 2023-09-19 13:23:44 -0500, Steve French wrote:
>On Tue, Sep 19, 2023 at 1:07 PM Tom Talpey <tom@talpey.com> wrote:
>> These changes are good, but I'm skeptical they will reduce the load
>> when the laundromat thread is actually running. All these do is avoid
>> creating it when not necessary, right?
>
>It does create half as many laundromat threads (we don't need
>laundromat on connection to IPC$) even for the Windows server target
>example, but helps more for cases where server doesn't support
>directory leases.

Perhaps the laundromat thread should be using msleep_interruptible()?

Using an interruptible sleep appears to prevent the thread from contributing
to the load average, and has the happy side-effect of removing the up-to-1s delay
when tearing down the tcon (since a7c01fa93ae, kthread_stop() will return
early triggered by kthread_stop).

~Paul

 

Redis Continually Receiving SIGTERM

I brought up a redis cluster — three servers which all logged basically nothing apart from the fact they were about to shut down. The service status showed as “Activating” — never started — and the server wasn’t doing anything useful.

The redis log reads:

2920940:signal-handler (1694019281) Received SIGTERM scheduling shutdown...
2921151:signal-handler (1694019374) Received SIGTERM scheduling shutdown...
2921518:signal-handler (1694019468) Received SIGTERM scheduling shutdown...
2921726:signal-handler (1694019561) Received SIGTERM scheduling shutdown...
2922133:signal-handler (1694019655) Received SIGTERM scheduling shutdown...
2922410:signal-handler (1694019748) Received SIGTERM scheduling shutdown...
2923173:signal-handler (1694019842) Received SIGTERM scheduling shutdown...
2923537:signal-handler (1694019935) Received SIGTERM scheduling shutdown...
2923747:signal-handler (1694020029) Received SIGTERM scheduling shutdown...
2924110:signal-handler (1694020122) Received SIGTERM scheduling shutdown...
2924319:signal-handler (1694020216) Received SIGTERM scheduling shutdown...
2924687:signal-handler (1694020309) Received SIGTERM scheduling shutdown...
2924900:signal-handler (1694020403) Received SIGTERM scheduling shutdown...
2925266:signal-handler (1694020496) Received SIGTERM scheduling shutdown...
2925467:signal-handler (1694020590) Received SIGTERM scheduling shutdown...

Turns out this is a hazard of copy/pasting a unit file from an older server — evidently redis cannot use a service type of “Forking” with systemd. To resolve the issue, edit /etc/systemd/system/redis.service and updating the type to “simple”. Use systemctl daemon-reload and then systemctl restart redis to launch redis with the new config … voila, I’ve got a cluster of three servers that are started and communicating.

Tableau: Workbooks and Views Created or Modified By a Specific Individual

I had a manager looking to locate a ‘something in Tableau’ that was created by a specific individual — in this case, it was a terminated employee so “just ask the person” was not a viable solution. I put together a query to list all workbooks owned by or modified by an individual:

SELECT w.id, w.name, w.description, w.owner_id, w.modified_by_user_id, owner_system_users.email AS owner_email, modified_system_users.email AS modifier_email
     FROM  public.workbooks AS w
      LEFT OUTER JOIN public.users AS owner_users on w.owner_id = owner_users.id
      LEFT OUTER JOIN public.users AS modified_users ON w.owner_id = modified_users.id
      LEFT OUTER JOIN public.system_users AS owner_system_users ON owner_system_users.id = owner_users.system_user_id
		LEFT OUTER JOIN public.system_users AS modified_system_users ON modified_system_users.id = modified_users.system_user_id
      WHERE owner_system_users.name = 'UserLogonID';
--      WHERE owner_system_users.email LIKE '%Smith%' OR modified_system_users.email = '%Smith%'
		;

As well as a query to identify all views owned by an individual:

SELECT views.*, owner_system_users.email AS owner_email
     FROM  public.views 
      LEFT OUTER JOIN public.users AS owner_users on views.owner_id = owner_users.id
      LEFT OUTER JOIN public.system_users AS owner_system_users ON owner_system_users.id = owner_users.system_user_id

      WHERE owner_system_users.name = 'UserLogonID';
--      WHERE owner_system_users.email LIKE '%Smith%' OR modified_system_users.email = '%Smith%'
		;

The email address based search is most reasonable — our email addresses are algorithmically based on our names, so we always know what the address would have been. Many contractors, however, don’t have Office 365 licenses or mailboxes … so I have to fall back to finding their logon ID in those cases.