Category: System Administration

Cisco Aironet — Unable to Access Wireless Device That Moved to Different Access Point

I think this is a fairly esoteric issue — something that happens frequently enough but doesn’t actually impact functionality so anyone notices. We got a new WiFi doorbell that we set up inside (and could access it), took outside (and could access it) … but, when we went back inside? We could not access the doorbell. No HTTPS, no RTSP, no ICMP. Nothing.

Cisco access points maintain a list of associated wireless clients. These may also be kept in an arp table, although arp caching appears to be disabled by default. So device was on AP1, moved to AP2. Clients on AP2 (or AP3, or AP4) were able to access it since the switch has it registered on the port for AP2. Anything on AP1, however, cannot access the device. The MAC address still appeared in the associations table for AP1. You can set a lower activity timeout — the default was one day — to clear devices out more promptly. But … if the device communicates outside of its new WAP, how frequently is it going to be talking to a device on its old WAP? Generally, we’re talking to our servers (wired) or the Internet (also wired). So technically … Scott’s cell phone couldn’t reach my cell phone when I go from the bedroom to the office. But we never notice because we have minimal peer to peer communication. It’s not like doorbells are going to go walking about normally … but it was good to know a quick AP reboot would allow our cell phones to pull up the doorbell’s video feed.

SSH’ing to Older Cisco Access Points

Trying to ssh into our Cisco access points, we get an error saying “no matching key exchange method found. Their offer: diffie-hellman-group1-sha1” … to one-off enable older, deprecated algorithms, we added a cisco.conf to /etc/ssh/ssh_config.d (/etc/ssh/ssh_config includes /etc/ssh/ssh_config.d/*.conf)

Host <IP>
     Ciphers 3des-cbc,blowfish-cbc,aes128-cbc,aes128-ctr,aes256-ctr
     KeyAlgorithms diffie-hellman-group1-sha1

And restart sshd — voila*, you can SSH into the router / access point / etc.

* — you may get an invalid key length error. In this case, you need to regenerate the key on the Cisco device using a 2048-bit key:

config term
crypto key zeroize rsa
crypto key generate rsa modulus 2048
end

Cisco Catalyst 2960-S: Capturing All Traffic Sent Through a Port

We had an issue where an IOT device was not able to establish the connection it wanted — it would report it couldn’t connect to the Internet. I knew it could connect to the Internet in general; but, without knowing what tiny part of the Internet it used to determine ‘connected’ or ‘not connected’, we were stuck. Except! We recently upgraded the switch in our house to a Cisco Catalyst 2960S — which allows me to do one of the cool things I’d seen the network guys at work do but had never been able to reproduce at home: using SPAN (Switched Port ANalyzer). When we’d encounter strange behavior with a network device where we couldn’t just install Wireshark and get a network capture, the network group would basically clone all of the traffic sent to the device’s port to another switch port where we could capture traffic. They would send me a capture file, and it was just like having a Wireshark capture.

You can set up SPAN from the command line configuration, but I don’t have a username/password pair to log into SSH (and can only establish this from the command line configuration). Before breaking out the Cisco console cable, I tried running Cisco Network Assistant (unfortunately, a discontinued product line). One of the options under “Configure” => “Switching” is SPAN:

Since there was no existing SPAN session, I had to select a session number.

Then find the two ports — in the Ingress/Egress/Destination column, the port that is getting the traffic you want needs to either have Ingress (only incoming traffic), Egress (only outgoing traffic), or Both (all traffic). The port to which you want to clone the traffic is set to Destination. And the destination encapsulation is Replicate. Click apply.

In the example above, the laptop plugged in to GE1/0/24 gets all of the traffic traversing GE1/0/5 — running tshark -w /tmp/TheProblem.cap writes the packet capture to a file for later analysis. Caveat — the destination port is no longer “online” — it receives traffic but isn’t sending or receiving its own traffic … so make sure you aren’t using remote access to control the device!

To remove the SPAN, change the Ingress/Egress/Destination values back to “none”, change the destination encapsulation back to select one, and apply.

Since the source port is connected to one of our wireless access points, the network capture encompasses all wireless traffic through that access point.

And we were easily able to identify that this particular device uses the rule “I can ping 8.8.8.8” to determine if it is connected to the Internet. We were able to identify a firewall rule that prevented ICMP replies; allowing this traffic immediately allowed the devices to connect as expected.

*Un*Registering SysInternals ProcMon as Task Manager Replacement

I like the sysinternals tools — I use them frequently at work. But, generally? When trying to look at the running Windows processes or how much memory is being used … I need a really small, simple tool that doesn’t add to the bogging that’s already happening. Which is why I hate when people replace taskmgr.exe with the SysInternals task manager on steroids. It’s too much information. The worst part is that the menu option to replace task manager doesn’t un-replace it.

To accomplish that — to revert to the real Windows task manager — you need to edit the registry. Navigate to HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\taskmgr.exe and remove the key named “Debugger” which points to the SysInternals binary.

Tableau Upgrade Failure

Attempting to upgrade Tableau from 2021.something to 2022.3.5 repeatedly failed with the following error in the upgrade log:

Failed with the error "Detected the old version of Tableau Server takes precedence on the system PATH".

 

And there were all sorts of things online to try — most of which involved changing the environment variables Tableau sets. But they weren’t wrong … or, rather, it was impossible to tell which was “right”: the new version I was trying to install or the old version that was still running. On a whim, I thought I’d try an admin-level command prompt. Opening the command prompt with the “Run as administrator” option allowed me to upgrade the server.

Except … I upgraded another server the next night. I knew to launch the command prompt in administrator mode, but I was incredibly dismayed to encounter the exact same error. Even using an admin level command prompt. Then it occurred to me — Windows evaluates the environment variables when you launch the shell (in fairness, every Unix/Linux variant I’ve encountered does the same thing). The installer must be changing the Tableau environment variables. If you use a command prompt that was open prior to running the installer … you don’t have the new values. You have the old ones — so the environment variable, as seen by the upgrade script, is pointing to an older version of Tableau. Launch a new command prompt, re-evaluate the environment variables, and the script now sees the proper version. Hopefully my next upgrade won’t include a panic inducing “yeah, that’s not gonna happen” error!

Tableau PostgreSQL Query: Finding All Datasources of Name or Type

I frequently need to find details on a data source based on its name and find all data sources of a particular type. Particularly, the Microsoft Graph permissions required to use Sharepoint and OneDrive data within Tableau changed — I needed to reach out to individuals who use those data types to build a business case for the Security organization to approve the new permissions be added to our tenant.

-- Query to find all data sources of a specific type or name 
select system_users.email, datasources.id, datasources.name, datasources.created_at, datasources.updated_at, datasources.db_class, datasources.db_name
, datasources.site_id, sites.name as SiteName, projects.name as ProjectName, workbooks.name as WorkbookName
from datasources
left outer join users on users.id = datasources.owner_id
left outer join system_users on users.system_user_id = system_users.id
left outer join sites on datasources.site_id = sites.id
left outer join projects on datasources.project_id = projects.id
left outer join workbooks on datasources.parent_workbook_id = workbooks.id
-- where datasources.name like '%Sheet1 (LJR Sample%'
where datasources.db_class = 'onedrive'
order by datasources.name
;

Tableau PostgreSQL Query: Stats About Data Source Types

I wanted to report on the different types of data sources used in our Tableau instance — as well as show how many of each type are in use.

-- Query to found how many of each data source type
select datasources.db_class, count(datasources.db_class) as count
from datasources
left outer join users on users.id = datasources.owner_id
left outer join system_users on users.system_user_id = system_users.id
left outer join sites on datasources.site_id = sites.id
left outer join projects on datasources.project_id = projects.id
left outer join workbooks on datasources.parent_workbook_id = workbooks.id
group by datasources.db_class 
order by count desc ;

Answer: About half of them are Oracle!

Unable to use JStat with Cassandra

We have been having some problems with a Cassandra cluster, so I wanted to look at the java heap space. Unfortunately, jstat cannot find the pid. And, yes, it is the right PID!

Looking in /tmp/hsperfdata_cassandra/, there’s no file! Reading through the whole line where Cassandra is running, I noticed +PerfDisableSharedMem … that’d do it!

It looks like they intentionally set +PerfDisableSharedMem in the Cassandra startup script. I assume their rational is still reasonable … so wouldn’t remove the parameter for day-to-day operation. But, when there’s a problem … restarting Cassandra without this parameter allows us to check how garbage collection is going.

 

Java Heap Stats with JStat

While there are plenty of third-party utilities for looking at the java heap space, I just use jstat (in OpenJDK, this means installing java-<Version>-openjdk-devel

JStat will display the following columns:

--------------------------------------------------------------------------------
S0C: Survivor space 0 size in K
S1C: Survivor space 1 size in K

S0U: Survivor space 0 usage in K
S1U: Survivor space 1 usage in K

--------------------------------------------------------------------------------

EC: Eden space size in K
EU: Eden space usage in K

--------------------------------------------------------------------------------

OC: Old space size in K
OU: Old space usage in K

--------------------------------------------------------------------------------

MC: Meta space size in K
MU: Meta space usage in K

--------------------------------------------------------------------------------

CCSC: CodeCache size in K
CCSU: CodeCache usage in K

--------------------------------------------------------------------------------

YGC: Young generation garbage collection count
YGCT: Young generation garbage collection total time in seconds

FGC: Full garbage collection count
FGCT: Full garbage collection total time in seconds

CGC: Concurrent garbage collection count
CGCT: Concurrent garbage collection time in seconds
GCT: Total garbage collection time in seconds

--------------------------------------------------------------------------------

https://stackoverflow.com/questions/13660871/jvm-garbage-collection-in-young-generation/13661014#13661014 does a good job of explaining the nomenclature & how stuff gets moved around in the heap space

Sample output — this command is for java PID 19356 and will list 100 lines 2 seconds apart (2000 ms)

server01:bin # jstat -gc 19356 2000 100
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
68096.0 68096.0 0.0 64207.5 545344.0 319007.2 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 386674.5 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 457055.4 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 485538.8 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 505893.4 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815

And this is a time where a third-party tool would be helpful but I never really ‘get’ what is and what is not OK to install on servers, so try not to install things — because the *useful* bit of information for any of this is really the usage / size percent utilization value.

That last grouping of stuff — I look at those v/s how long the pid has been running. If you’ve gotten a billion GC’s and the PID has only been running for eight seconds, that is a crazy amount of I/O. If I’ve only had 3 GCs and the pid has been running for seven years, it hasn’t been doing anything. In between? I don’t really find the numbers useful unless I’ve got a baseline from normal operation.

Maintaining an /etc/hosts record

I encountered an oddity at work — there’s a server on an internally located public IP space. Because it’s public space, it is not allowed to communicate with the internal interface of some of our security group’s servers. It has to use their public interface (not technically, just a policy on which they will not budge). I cannot just use a DNS server that resolves the public copy of our zone because then we’d lose access to everything else, so we are stuck making an /etc/hosts entry. Except this thing changes IPs fairly regularly (hey, we’re moving from AWS to Azure; hey, let’s try CloudFlare; nope, that is expensive so change it back) and the service it provides is application authentication so not something you want randomly falling over every couple of months.

So I’ve come up with a quick script to maintain the /etc/hosts record for the endpoint.

# requires: dnspython, subprocess

import dns.resolver
import subprocess

strHostToCheck = 'hostname.example.com' # PingID endpoint for authentication
strDNSServer = "8.8.8.8"         # Google's public DNS server
listStrIPs = []

# Get current assignement from hosts file
listCurrentAssignment = [ line for line in open('/etc/hosts') if strHostToCheck in line]

if len(listCurrentAssignment) >= 1:
        strCurrentAssignment = listCurrentAssignment[0].split("\t")[0]

        # Get actual assignment from DNS
        objResolver = dns.resolver.Resolver()
        objResolver.nameservers = [strDNSServer]
        objHostResolution = objResolver.query(strHostToCheck)

        for objARecord in objHostResolution:
                listStrIPs.append(objARecord.to_text())

        if len(listStrIPs) >= 1:
                # Fix /etc/hosts if the assignment there doesn't match DNS
                if strCurrentAssignment in listStrIPs:
                        print(f"Nothing to do -- hosts file record {strCurrentAssignment} is in {listStrIPs}")
                else:
                        print(f"I do not find {strCurrentAssignment} here, so now fix it!")
                        subprocess.call([f"sed -i -e 's/{strCurrentAssignment}\t{strHostToCheck}/{listStrIPs[0]}\t{strHostToCheck}/g' /etc/hosts"], shell=True)
        else:
                print("No resolution from DNS ... that's not great")
else:
        print("No assignment found in /etc/hosts ... that's not great either")