Category: Technology

Exchange Disaster Recovery

I guess with everyone moving to magic cloudy pay-per-month Exchange, this isn’t such a concern anymore … but for those still running on-premise Exchange:

(1) Before you can restore your AD system state, you’ve got to build a server & bring up a temporary domain. There’s a “System Configuration” program that lets you select to restart in safe mode / directory services restore mode without having to time the F8 key or anything.

(2) The system state backup of a domain controller backs up a lot of stuff — including the registry which tells the server what software is installed and services. This means it is not possible to just run the Exchange setup.exe with the disaster recovery option. Fortunately, I was able to copy the Exchange folder from program files off of a backup. Unfortunately, the Exchange services wouldn’t start because DLLs couldn’t register. Did a diff between old server backup & new one — copied any missing stuff from c:\windows\system32 and c:\windows\syswow64 and, voila, Exchange is starting. Couldn’t mount the ebd file, though …

(3) Which brings me to eseutil an attempt to replay the transaction logs (eseutil /r) and then repair the database as much as possible (eseutil /p) got me an EDB file that the Exchange server could mount.

Grafana — SSO With PingID (OAuth)

I enabled SSO in our development Grafana system today. There’s not a great user experience with SSO enabled because there is a local ‘admin’ user that has extra special rights that aren’t given to users put into the admin role. If you just enable SSO, there is a new button added under the logon dialogue that users can use to initiate an SSO authentication. That’s not great, though, since most users really should be using the SSO workflow. And people are absolutely going to be putting their login information into that really obvious set of text input fields.

Grafana has a configuration to bypass the logon form and just always go down the OAUTH authentication:

# Set to true to attempt login with OAuth automatically, skipping the login screen.
# This setting is ignored if multiple OAuth providers are configured.
oauth_auto_login = true

Except, now, the rare occasion we need to use the local admin account requires us to set this to false, restart the service, do our thing, change the setting back, and restart the service again. Which is what we’ll do … but it’s not a great solution either.

 

Config to authenticate Grafana to PingID using OAUTH

#################################### Generic OAuth ##########################
[auth.generic_oauth]
name = PingID
enabled = true
allow_sign_up = true
client_id = 12345678-1234-4567-abcd-123456789abc
client_secret = abcdeFgHijKLMnopqRstuvWxyZabcdeFgHijKLMnopqRstuvWxyZ
scopes = openid profile email
email_attribute_name = email:primary
email_attribute_path =
login_attribute_path = user
role_attribute_path =
id_token_attribute_name =
auth_url = https://login.example.com/as/authorization.oauth2
token_url = https://login.example.com/as/token.oauth2
api_url = https://login.example.com/idp/userinfo.openid
allowed_domains =
team_ids =
allowed_organizations =
tls_skip_verify_insecure = true
tls_client_cert =
tls_client_key =
tls_client_ca =

Mounting DD Raw Image File

And a final note from my disaster recovery adventure — I had to use ddrescue to copy as much data from a corrupted drive as possible (ddrescue /dev/sdb /mnt/vms/rescue/backup.raw –try-again –force –verbose) — once I had the image, what do you do with it? Fortunately, you can mount a dd file and copy data from it.

# Mounting DD image
2023-04-17 23:54:01 [root@fedora /]# kpartx -l backup.raw
loop0p1 : 0 716800 /dev/loop0 2048
loop0p2 : 0 438835200 /dev/loop0 718848

2023-04-17 23:55:08 [root@fedora /]# mount /dev/mapper/loop0p2 /mnt/recovery/ -o loop,ro
mount: /mnt/recovery: cannot mount /dev/loop1 read-only.
       dmesg(1) may have more information after failed mount system call.

2023-04-17 23:55:10 [root@fedora /]# mount /dev/mapper/loop0p2 /mnt/recovery/ -o loop,ro,norecovery

2023-04-18 00:01:03 [root@fedora /]# ll /mnt/recovery/
total 205G
drwxr-xr-x  2 root root  213 Jul 14  2021 .
drwxr-xr-x. 8 root root  123 Apr 17 22:38 ..
-rw-r--r--. 1 root root 127G Apr 17 20:35 ExchangeServer.qcow2
-rw-r--r--. 1 qemu qemu  10G Apr 17 21:42 Fedora.qcow2
-rw-r--r--. 1 qemu qemu  15G Apr 17 14:05 FedoraVarMountPoint.qcow2


Mounting a QCOW File

We had a power outage on Monday that took out the drive that holds our VMs. There are backups, but the backup drive copies had superblock errors and all sorts of issues. To recover our data, I learned all sorts of new things — firstly that you can mount a QCOW file and copy data out. First, you have to connect a network block device to the file. Once it is connected, you can use fdisk to list the partitions on the drive and mount those partitions. In this example, I had a partition called nbd0p1 that I mounted to /mnt/data_recovery

modprobe nbd max_part=2
qemu-nbd --connect=/dev/nbd0 /path/to/server_file.qcow2
fdisk /dev/nbd0 -l
mount /dev/nbd0p1 /mnt/data_recovery

Once you are done, unmount it and disconnect from the network block device.

umount /mnt/data_recovery
qemu-nbd --disconnect /dev/nbd0
rmmod nbd

ISC Bind – Converting Secondary Zone to Primary

Our power went out on Monday and, unfortunately, the SSD on the server with all of our VMs got corrupted. The main server has ISC Bind configured to host all of our internal DNS zones as secondaries … but, a day after the primary DNS server went down, those copies fell over. Luckily, you can convert a secondary zone to primary. The problem is that the cached copy of the zone was … funky binary stuff.

Luckily there’s an executable to convert this into a text zone file — named-compilezone

-f raw -F text -o output_file_name zone_name input_file_name

So, to covert my rushworth.us zone:

named-compilezone -f raw -F text -o rushworth.us.db rushworth.us rushworth.us.db.bin

Then, in the named.conf file, change the zone type to “master” and remark out the line indicating which the masters are. Change the “files” line to the newly created file. If you haven’t already done so, add “allow-query {any; };” so clients can actually query the zone.

Zookeeper: Finding the Leader

When restarting our ensemble of zookeepers, I restart the leader last (to avoid repeatedly reallocating the role). Which means I’ve got to find the leader. Luckily the zookeepers are happy to report if they are the leader or a follower if you send ‘srvr’ to the zookeeper port.

jumpserver:~ # echo srvr | nc zcserver38.example.net 2181
Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT
Latency min/avg/max: 0/0/1383
Received: 3783871
Sent: 3784761
Connections: 7
Outstanding: 0
Zxid: 0x800003d25
Mode: follower
Node count: 3715

Looking at the “Mode” line above, I can see that’s the follower. So I’ll check the next Zookeeper …

jumpserver:~ # echo srvr | nc zcserver39.example.net 2181
Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT
Latency min/avg/max: 0/0/1167
Received: 836866
Sent: 848235
Connections: 1
Outstanding: 0
Zxid: 0x800003d25
Mode: leader
Node count: 3715
Proposal sizes last/min/max: 36/32/19782

And that’s the leader — so 39 will be the last one rebooted.

Logstash – Key Value Parsing

The KV filter plug-in is a quick way to split key/value pairs from message data. An example syslog message where there is some prefix information followed by key/value pairs. In this case, each pair is separated by a semicolon and they keys and values are separated by a colon.

<140>1 2023-04-13T17:43:00+01:00 DEVICENAME5@10.1.2.3 EVENT 2693 [meta sequenceId="33"]"time-stamp":2023-04-13T17:43:00+01:00;"session-id":;"user-name":;"id":0;"type":CREATE;"entity":not-alarmed-event-notification

The first thing you need to do is to parse the message so the key/value pair data is in a single field.

"message" => "<%{POSINT:syslog_pri}>%{NUMBER:stuff} %{DATA:syslog_timestamp}+%{DATA:syslog_timestamp_offset} %{SYSLOGHOST:logsource}@%{DATA:sourceip} %{DATA:log_type} %{NUMBER:event_id} \[meta sequenceId=\"%{DATA:meta_sequence_id}\"\] %{GREEDYDATA:kvfields}"

Now that the data is available in kvfields, the kv filter can be used to parse the data. Indicate which character splits fields, which character splits the key and value, and what field is the source of the key/value pair data. Additionally, if you need to trim data from keys (trim_key) or values (trim_value), you can do so. In this case, each of the keys is quoted. I do not wish to carry the quotes through on the field name, so I am trimming the double-quote character from keys.

kv {
     field_split => ";"
     value_split  => ":"
     trim_key  => '"'
     source  => "kvfields"
}

You can recursively parse data, if needed, and the key/values parsed from a value will be sub-elements of the parent key.

Ruby

Sometimes more advanced logic is required to parse message content. There is a ruby filter plugin that allows you to run ruby code. As an example, the “attributes” key contains key/value pairs but the same delimiter is used for both key/value and the list of pairs.

<140>1 2023-04-13T17:57:00+01:00 DEVICENAME5@10.1.2.3 EVENT 2693 [meta sequenceId="12"] "time-stamp":2023-04-13T17:57:00+01:00;"session-id":;"user-name":;"id":0;"type":CREATE;"entity":not-alarmed-event-notification;"attributes":"condition-type;T-BE-FEC;condition-description;Bit Error Forward Error Correction HT = 325651545656;location;near-end;direction;ingress;time-period;1min;service-affect;NSA;severity-level;cleared;fm-entity;och-os-1/2/2;fm-entity-type;OCH-OS;occurrence-date-time;2023-04-13T17:55:55+01:00;alarm-condition-type;standing;extension-description;;last-severity-level;not-applicable;alarm-id;85332F351D9EA5FC7BB52C1C75F85B5527251155;"

If you break the string into an array on the delimiter, even elements are the key and the +1 odd element is the corresponding value.

ruby {
     code => "
          strattributes = event.get('[attributes]')
          arrayattributebreakout = strattributes.split(';')
          if arrayattributebreakout.count > 0
               arrayattributebreakout.each_with_index do |element,index|
               if index.even?
                    event.set(arrayattributebreakout[index], arrayattributebreakout[index+1])
               end
          end
       end
       "
}

 

 

Predicting the Future

That didn’t take longhttps://www.engadget.com/three-samsung-employees-reportedly-leaked-sensitive-data-to-chatgpt-190221114.html

Leaking data is obviously a big problem if the user base is “anyone with an internet connection”, but potentially not great even for an internal implementation of an AI chatbot.

Content management platforms, in the early days, had a big problem with search because the indexing engine had super-user rights – so searching for “acquisition” would give you links that you couldn’t read. Even if the titles didn’t tell you anything (does “Project OPUS” or “Project Golden Falcon” have any meaning to you?), the dates & authors told you something (hey, there’s a bunch of new docs the C-levels are creating about acquisitions this past few weeks … sure that doesn’t mean anything!). Eventually any halfway decent content management platform understood permissions and at least attempts to filter results based on what you have permission to view.

AI is different, unfortunately in a way that makes implementing that type of security more difficult. Other than individualizing the trained AIs for each user (so info you feed in is only going to be reflected in your future results) or not training based on user input (only use stuff that’s openly readable already) … it would be rather challenging to filter an implementation so it knows stuff it’s been told but doesn’t convey that information to unauthorized individuals.

Increasing Kibana CSV Report Max Size

The default size limit for CSV reports in Kibana is 10 meg. Since that’s not enough for some of our users, I’ve been testing increases to the xpack.reporting.csv.maxSizeBytes value.

We’re still limited by the ES http.max_content_length value — which the documentation seems pretty confident shouldn’t be increased because the system can become unstable. Increasing the max Kibana report size to 100mb just yields a different error because ES doesn’t like it. 75 exhausted the JavaScript heap (?) – which I could get around by setting  NODE_OPTIONS=–max_old_space_size=4096 … but that just led to the server abending whenever a report was run (in fact, I had to remove the reports I tried to run from the server to get everything back into a working state). Increasing the limit to 50 meg, though, didn’t do anything unreasonable in dev. So somewhere between 50 and 75 meg is our upper limit, and 50 seemed like a nice round number to me.

Notes on resource usage – Data is held in memory as a report is created. We’d see an increase in memory/CPU usage while the report is being generated (or, I guess more accurately, a longer time during which the memory/CPU usage is increased because if a 10 meg report takes 30 seconds to run then a 50 meg report is going to take 2.5 minutes to run … and the memory/CPU usage is pretty much the same during the “a report is running” period).

Then, though, the report is stashed in ElasticSearch for user(s) to retrieve within .reporting* indicies. And that’s where things get a little silly — architecturally, this is just another index; it ages off with a lifecycle policy if one exists. But it looks like they never created a lifecycle management policy. So you can still retrieve reports run a little over two years ago! We will certainly want to set up a policy to clean up old reports … just have to decide how long is reasonable.