Category: System Administration

Discourse in Docker on Fedora 32

I had to make a few tweaks in order to run the Discourse base Docker image. First, I got the following very clear error:

discourse docker this version of runc doesn't work on cgroups v2: unknown

I had to switch from cgroupv2 to cgroup

grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"

At which point I was at least able to run through the configuration. This yielded an access denied error attempting to create /shared/postgres:

Configuration file at updated successfully!

Updates successful. Rebuilding in 5 seconds.
Building app
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
cd /pups && git pull && /pups/bin/pups --stdin
Already up to date.
I, [2020-08-11T18:15:03.664400 #1] INFO -- : Loading --stdin
I, [2020-08-11T18:15:03.672609 #1] INFO -- : > locale-gen $LANG && update-locale
I, [2020-08-11T18:15:03.746912 #1] INFO -- : Generating locales (this might take a while)...
Generation complete.

I, [2020-08-11T18:15:03.747838 #1] INFO -- : > mkdir -p /shared/postgres_run
mkdir: cannot create directory ‘/shared/postgres_run’: Permission denied
I, [2020-08-11T18:15:03.754890 #1] INFO -- :

FAILED
--------------------
Pups::ExecError: mkdir -p /shared/postgres_run failed with return #<Process::Status: pid 21 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "mkdir -p /shared/postgres_run"
d98ee8471413ad77ab27ed3506f12c5c94a2b6902622faf4d88d5dbb51a10f63
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

Gut was that I encountered an SELinux problem. Turns out I was right. There’s a lot of reading you can do about SELinux and Docker — this, for one — but the quick and simple solution is to run the docker container in privileged mode (note: this may not be a good idea in your specific scenario. understand what privileged mode is and the risks it entails). To do so, edit the launcher script (/var/discourse/launcher in my case) and add  “–privileged” to user_args:

And finally (and this may well be a RTFM thing) — you’ve got to have your public DNS set up & whatever firewall rules to get traffic to the http:// website you are trying to build in order to use the LetsEncrypt SSL cert and configure HTTPS. It uses the file-based verification (i.e. create a file named xyz in /path/to/xyz.whatever on your web server, lets encrypt grabs the file and verifies it exists) which fails quite spectacularly when the Internet at large cannot access your about-to-be-a-discourse-server.

Building LIB_MYSQLUDF_SYS On Fedora 31

I moved my MariaDB server to a new host and could not follow my previously working instructions to build lib_mysqludf_sys. The error indicated that my_atomic.h was not found.

[lisa@server03 lib_mysqludf_sys]# make
gcc -fPIC -Wall -I/usr/include/mysql/server -I. -shared lib_mysqludf_sys.c -o /usr/lib64/mariadb/plugin//lib_mysqludf_sys.so
In file included from /usr/include/mysql/server/my_sys.h:34,
from lib_mysqludf_sys.c:41:
/usr/include/mysql/server/my_pthread.h:26:10: fatal error: my_atomic.h: No such file or directory
26 | #include <my_atomic.h>
| ^~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:4: install] Error 1

The missing file is located in /usr/include/mysql/server/private … so I had to include that file in the gcc command as well. My new Makefile reads as follows:

[lisa@server03 lib_mysqludf_sys]# cat Makefile
LIBDIR=/usr/lib64/mariadb/plugin/

install:
gcc -fPIC -Wall -I/usr/include/mysql/server -I/usr/include/mysql/server/private -I. -shared lib_mysqludf_sys.c -o $(LIBDIR)/lib_mysqludf_sys.so

I was then able to make and use install.sh to load it into MariaDB.

What Can I sudo?

Some 90% of my Linux experience is on servers where I have root or root-equivalent access (i.e. I can sudo anything). In those cases, ‘what can I run under sudo’ was never a question. And I’d use something like “sudo less /etc/sudoers” to inspect what someone else was able to run when they questioned their access. In my new position, we have a lot of servers that we own too — the Engineering IT support group lets us spin up our own VMs, do whatever we want (within reason). But we have a few IT-managed servers with very restricted rights. And the commands I would use to perform functions (think systemctl restart httpd) aren’t in my sudoers access list. Luckily you can list out what you can run under sudo:

$ sudo -l
[sudo] password for useraccount:
Matching Defaults entries for useraccount on this host:
syslog=auth, loglinelen=0, syslog_goodpri=info, syslog_badpri=err,
logfile=/var/log/sudo.log

User useraccount may run the following commands on this host:
(ALL) /opt/lampp/lampp start, (ALL) /opt/lampp/lampp stop, (ALL)
/opt/lampp/lampp restart, (ALL) /usr/sbin/apachectl

And that is how I know to use apachectl instead of systemctl.

NVIDIA Driver Installation Issue – Fedora 30

NVIDIA finally released an updated driver for Scott’s laptop — one that should be compatible with the 5.x kernel. Ran through the normal process and got the following error:

     Unable to load the nvidia-drm kernel module

Which … was at least new. Tried running through the installation again but not registering the driver with the kernel. Installation completed successfully, and he’s able to boot the 5.8.100 kernel.

SCCM Shows “No items found”

The Windows 10 1909 upgrade was rolled out at work, and I got the “if you don’t get this installed, I’m gonna tell your manager” e-mail. Which is odd since all of this ‘stuff’ is supposed to be doing its thing in the background. But whatever. So I opened the “Software Center” and was told there were no items found under applications. Which … possible, I guess. I don’t use IT-deployed software that isn’t part of the stock image. But clicking over to “Operating Systems” (where the update should be found) also yielded “No items found”.

I know enough about Microsoft applications & AD to know I’m on cached credentials when I initiate the VPN connection. No idea what the refresh period is like, so I lock and unlock my workstation to ensure I’ve got an active authentication token. But that didn’t help — still no items found. I had to go into the “Control Panel”, open “Configuration Manager” as an administrative user, and select the ‘Actions’ tab. There were two — “Machine Policy Retrieval & Evaluation Cycle” and “User Policy Retrieval & Evaluation Cycle”. I ran both of them. A few minutes later, I went back into the Configuration Manager utility & found a bunch of things on the actions tab.

I ran all of them — nothing changed. Then let the computer sit for a few hours (I’m certain less than a few hours would have sufficed, but I had other things to do). Ran all of the actions again, and a notice popped up that I have new software available. Sigh! Now I’m downloading the six gig update — a process that should be done in a few hours. But at least I’ll have the update installed before the deadline.

In the process, I also discovered that the CCM logs have been moved from SYSTEM32/SYSWOW64 and are now located at %WINDIR%\CCM\logs

Apache HTTPD: SSL Virtual Hosts

For quite some time, you couldn’t bind multiple SSL web sites to a single IP:Port combination — this had to do with the mechanics of negotiating an SSL session — the client and server negotiated encryption based on a specific certificate before the server really knew what the client was trying to retrieve. The quick/easy solution was to just add a virtual IP to the box and bind each individual web site to a unique IP address. While this was quite effective in a corporate environment or purely internal network, it was a terrible solution for a set of home-hosted personal web servers — I don’t want to buy four public IP addresses to host four differently named websites. My workaround was to off-port sites no one else would be using (the MQTT WebSockets reverse proxy) and use a reverse proxy to map paths within the family website to the remaining web servers. This page, for instance, is rushworth.us/lisa … which the reverse proxy re-maps to https://lisa.rushworth.us behind the scenes.

With Apache HTTPD 2.2.12 or later built against OpenSSL v0.9.8g or later, you can use Server Name Indication (SNI) to serve multiple SSL websites from a single IP:Port just like you have been able to do with non-SSL sites. Using SNI, the client includes “what they’re looking for” in first message of the SSN negotiation process so the server knows which cert to serve.

In your httpd.conf, indicate that you want to use SNI on an IP:Port combo

# Listen for virtual host requests on all IP addresses
NameVirtualHost *:443

And, optionally, configure one of the named virtual hosts as the default for non-SNI browsers:

SSLStrictSNIVHostCheck off

Now the configuration for your SSL sites can include a ServerName directive. Restart Apache HTTPD, and you’ll be able to access the proper SSL-enabled website without adding virtual IP addresses.

Re-IP’ing Nightmare

We had a strange problem with our DSL modem a year or so ago — immediately after a firmware update got pushed to us. If something was plugged into a switch port, the whole thing became inaccessible. Only by unplugging all of the network cables and rebooting the thing would it stay online. Strange thing, though, is we were able to put it in bridge mode, plug something in, and have a network. Unfortunately, speed tests came back between 10 and 20 meg … it worked, we could still watch movies, VPN into work, listen to music … but something was clearly not right. In researching the issue, I’d come across a lot of other people who experienced dramatic reduction in speed when they switched their ISP’s device to bridge mode. Coupled with the fact our emergency spare access point, which got promoted to “access point for half of the house” was flaky and IPL’d every couple of days.

Since I’m able to download the DSL modem firmware from work, we’ve wanted to flash the DSL modem … well, basically ever since we came online after the problem occurred. Someone’s always doing something that precludes dropping the Internet for an hour … but, yesterday, it was time. Scott was working on the local network, Anya’s got plenty of books, and I had an hour to spare. Hard reset the DSL modem (you cannot access the admin page in transparent bridge mode), flashed it with the most recent firmware we’ve approved for use, and voila … it’s all working again. I even brought up ISC DHCPD on the internal server so we can add as many static addresses as we want without concern for nvram usage. Scott hard reset the other access point, updated its firmware, and returned it to its position on the other side of the house. Perfect! We’ve now got two access points that stay online. Except — the Actiontec T3200 has no way to define a static route!? I’m sure 99% of their customers don’t care, but when we bought our new server, I set the libvirt VMs up on their own network. Not for any good reason, that was just the configuration I found in all of the online documentation I reviewed.

While I could shell into the Asus and add a route (even include that command in the j-whatever script that executes on boot), that didn’t let traffic from the Internet in to our reverse proxy or mail server. I needed to move the VMs onto a bridge that used the routable internal subnet. And thus began my re-IP’ing nightmare.

Step #1 — Add a bridge network that exists on the internal subnet. Sounds straight-forward enough, but several iterations just tanked the server’s network. At one point, we could reboot the server and have connectivity, reboot again and get nowhere. I cleared out everything I had added, rebooted, and at least had the main server online. With the X display redirected to my laptop, I used nm-connection-editor to create a bridge and the slave device. Disabled stp manually (nmcli connection modify vmbridge bridge.stp no), but I used the GUI for everything else. I’m certain it’s possible to do this all through nmcli … but it was an exercise in frustration (and I’m a big fan of CLI stuff to start with). I used the magic-me-a-bridge wizard, clicked ‘add’ for its device, ran through the magic-me-a-slave-device wizard, added a new temporary IP address to the bridge, dropped the previously-used Ethernet (wired) interface, and brought the bridge online. Voila, network. I added the IP address from the Ethernet interface to the bridge. I’m certain this isn’t The Right Thing To Do(TM), and it’s quite possible that I could safely drop the temporary IP I’d put on the bridge to maintain access to the server. But after eleven hours of problems getting to this state, I’m loathe to rock this boat.

Step #2 — Add a bridge to the libvirt config. Create an XML file with the bridge definition

 2020-06-10 12:21:32 [user@host ~/]# cat bridge.xml
<network>
    <name>vm-bridge</name>
    <forward mode="bridge"/>
    <bridge name="vmbridge"/>
</network

Then use net-define to create a bridge based on your config (virsh net-define bridge.xml). Start your bridge (virsh net-start vm-bridge) and, assuming everything works well, set it to autostart (virsh net-autostart vm-bridge).

Step #3 — The next step *should* be to move your VMs to the new bridge — this involves updating the IP addresses … since I won’t be able to access the servers until that is done, I’ll need the GUI to connect into each server. Unfortunately, my next step actually was to get virt-manager working again when it freaks out. At some point, I had a virt-manager session shut down improperly or something. Attempts to launch virt-manager resulted in the error “Error starting Virtual Machine Manager: g-io-error-quark: Timeout was reached ” … which isn’t particularly helpful. Killing the virt-manager PIDs and restarting libvirtd (systemctl restart libvirtd) restored the management interface.

Step #4 — Move the VMs to your new bridge and change the IPs for the new subnet. I used virt-manager to switch the network interface to my new bridge, booted each server, and updated the IP in the OS. This is somewhere that statically assigning IP’s through DHCP would have made things a little simpler … but updating the IP on Fedora and Windows is straight-forward enough.

Step #5 — Getting Exchange back online. Most documentation tells you Exchange doesn’t care about its IP — which is not exactly true. I knew I would have to edit a few configurations to reflect the new subnet — my mailertable records for internal routing, bindings within Apache HTTPD config files, the Exchange send connector smarthost (set-sendconnector -identity “Connector Name” -SmartHosts “[w.x.y.z]”) since my sendmail server’s IP changed too

And update the binding for the Exchange receive connector. Wasn’t sure why I bound the receive connector to the specific IP interface, so I bound it to all interfaces on port 25. Now that everything’s set up and ports all show up as open … I’m ready to clear through the queued mail. A quick “sendmail -q -v” command and …. uhh, that’s no good. I’m getting “451 4.7.0 Temporary server error. Please try again later. PRX3”

I realized that my previous config had the receive connector bound to a specific IP address for a reason. While changing it to 0.0.0.0 saves a config step if I have to re-IP again, Exchange doesn’t work well when the SMTP server is bound to all interfaces. I had to bind it to the specific IP (set-receiveconnector -identity “Server\Connector Name” -Bindings “w.x.y.z:25″,”[::]:25”) … the IPv6 binding may need to be specific, too, if you actually use IPv6. I don’t … so it’s not.

One final thing to remember — Exchange likes to have a hosts entry for itself. No idea — some Linux-based apps have the same quirk, so I never bothered to investigate farther. Update the hosts file, flush the dns cache (ipconfig /flushdns), and finally I’ve got mail dropping into my mailbox.

I’ve finally returned to the state I was in yesterday afternoon. Well, I went from 10-20 meg speed tests to 50-80 meg speed tests. Upload went from 1-3 to 8-10, too. My Internet speed is very fast 🙂

NodeJS Unit File

For future reference, this is an example unit file for running a NodeJS server with systemd. The NodeJS code we use reads from a local MariaDB, so I’ve added a service dependency for the database server.

Create /etc/systemd/system/nodeserver.service

[Unit]
Description=SiteName Node.js Server
Requires=After=mariadb.service

[Service]
ExecStart=/path/to/binary/for/node /path/to/nodeJS/html/server.js
WorkingDirectory=/path/to/nodeJS/html
Restart=always
RestartSec=30
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=nodejs-sitename
User=web-user
Group=www-group

[Install]
WantedBy=multi-user.target

Use systemctl daemon-reload to register the new unit file, then “systemctl start nodeserver.service” to start the service. Assuming everything works properly, use “systemctl enable nodeserver.service” to have the service start on boot.