Category: System Administration

Active Directory – Prevent Renaming and Moving OU

On my home domain, I’ve always added an access control entry to prevent this from happening … it’s really easy to double-click and be in rename mode or drag and drop an OU into a new location. I’ve always considered this to be a bit of paranoia on my part — not like anyone’s routinely screwing up entire OU’s.

Until they are. We’ve had significant two outages at work caused by unintentional changes to Active Directory organizational unit names. Partially to avoid wide-spread outages due to something that’s fundamentally silly and partially because any widespread outage requires a root cause analysis that includes some action you’ve taken to prevent whatever from happening again we’re going to implement the same permission tweak at work now.

Since it’s not just me who has wanted/needed this permission, figured I would publish how I’ve got the permissions set:

No idea why the GUI shows name and Name instead of rdn and CN respectively. But I’ve denied write for adminDisplayName, rdn, and cn.

Just like the “prevent accidental deletion” checkbox is a bit of a pain when you want to delete an OU, this is inconvenient if you want to rename or move OUs. The first step is to remove the permission, then you can make your change, and then you’ve got to re-apply the permission. Slight inconvenience, but having the entire company failing LDAP authentications (where the base DN no longer finds the users) is a massive inconvenience too.

OUD 11g Failure

Ran out of disk space on the OUD partition (don’t let people turn on debug logging!!). Easily fixed — stop service, remove big log file, start again. Unfortunately that “start again” wasn’t as easy as it could have been. Server startup fails. It says the cryptomanager failed to publish the instance-key-pair in ADS …. which, Oracle says means you get to recreate the single instance. Honestly, this isn’t a horrible process since I’ve got a backup of the directory. But I’d rather not spend the next hour messing around with the server.

The underlying problem is that the admin partition didn’t dump out to its storage file (full disk), so admin-backend.ldif is a 0-byte file. Now, had the directory been running when I noticed this … I could have dumped the cn=admin data partition to another volume and copied the file in once I cleared up disk space. But I have a file-level backup … and it was easy enough to pull the file from last night back. Voila, one server online in a couple of minutes. And a good reminder for the future that a shutdown with a full disk … no good. I probably want to find something to free up a couple meg of space prior to shutting down the server. Extra-cautious option would be the MS Exchange approach of creating a few 5-meg files to ensure there’s space that can be freed up.

 

Added 06 Feb 2020: One of my colleagues encountered this problem again today, and it turns out you can copy the admin-backend.ldif from another server too. It may not be 100% — he cannot log into the server through the admin console. But he’s got a directory being served & no users complaining about an outage. It can get sorted properly later.

Cloudy ROI

I often have trouble seeing the value behind cloud offerings — but most cloud migrations I’ve seen have done 1:1 replacement of locally hosted servers with cloud hosted servers. The first two years, the cloud hosted servers are cheaper (although that’s some dodgy accounting as we’re assuming no workforce changes as a result of outsourcing servers and depreciation of the owned asset is not considered). The third year, though, is a break-even point. General Depreciation System considers computers a five-year property, but there are accounting practices to handle fully depreciated assets. It remains on the balance sheet as a cost, it’s accumulated depreciation is listed as a accumulated depreciation contra asset item. When you *do* stop using the asset, the accumulated depreciation account is debited for the full depreciated amount, the fixed asset account is credited with its full cost. Point being I can continue using a computer asset after five years. Cloud hosted servers make financial sense for a company that tends towards “bleeding edge” implementations (buying the new whatever next year), but for a company that buys a server or application and then uses it for a decade … you’re simply turning capital expense into a greater ongoing operating expense. Which … good this year, but bad in the long term.

Now for a smaller company that doesn’t have a dedicated IT department, and that doesn’t actually need the capacity provided by a single modern server … externally hosting resources is financially beneficial. A web site, e-mail, chat-based customer service? All make sense to host externally. You don’t have to own half a dozen servers, make sure they’re backed up, etc. But I don’t see the cost benefit at enterprise levels unless (1) you want to build data centers close to customers without the expense of actually building a data center. For instance, opening your services to customers in the EU … getting a data center set up in, say, Germany isn’t a quick proposition. As your business grows, it may become “worth it” to invest money into a European data center. But cloud-hosted computers from some major provider who already has a presence there provides quick time-to-market and minimizes up-front cost. Some countries may have a laborious process for prospective businesses too — a process the cloud hosting provider has already navigated. Or you (2) plan a substantial workforce reduction. If someone else is backing up, patching, and monitoring systems … you don’t need people performing those duties. Since a cloud-hosting provider is able to leverage those employees across far more servers than you’d need — there’s a place where scale produces a cost benefit. But, strangely, I don’t see companies reducing IT operations staff after moving to the cloud. This may be a long-term goal to ensure the enthusiasm of staff for the move — it’s not particularly enticing to put six months of work into a project that ensures my job goes away. Or this may just be a thing — move to the cloud and still have twenty ops employees.

Open Password Filter (OPF) Detailed Overview

When we began allowing users to initiate password changes in Active Directory and feed those passwords into the identity management system (IDM), it was imperative that the passwords set in AD comply with the IDM password policy. Otherwise passwords were set in AD that were not set in the IDM system or other downstream managed directories. Microsoft does not have a password policy that allows the same level of control as the Oracle IDM (OIDM) policy, however password changes can be passed to DLL programs for farther evaluation (or, as in the case of the hook that forwards passwords to OIDM – the DLL can just return TRUE to accept the password but do something completely different with the password like send it along to an external system). Search for secmgmt “password filters” (https://msdn.microsoft.com/en-us/library/windows/desktop/ms721882(v=vs.85).aspx) for details from Microsoft.

LSA makes three different API calls to all of the DLLs listed in the NotificationPackages registry hive. First, InitializeChangeNotify(void) is called when LSA loads. The only reasonable answer to this call is “true” as it advises LSA that your filter is online and functional.

When a user attempts to change their password, LSA calls PasswordFilter(PUNICODE_STRING AccountName, PUNICODE_STRING FullName, PUNICODE_STRING Password, BOOLEAN SetOperation) — this is the mechanism we use to enforce a custom password policy. The response to a PasswordFilter call determines if the password is deemed acceptable.

Finally, when a password change is committed to the directory, LSA calls PasswordChangeNotify(PUNICODE_STRING UserName, ULONG RelativeId, PUNICODE_STRING NewPassword) — this is the call that should be used to synchronize passwords into remote systems (as an example, the Oracle DLL that is used to send AD-initiated password changes into OIDM). In our password filter, the function just returns ‘0’ because we don’t need to do anything with the password once it has been committed.

Our password filter is based on the Open Password Filter project at (https://github.com/jephthai/OpenPasswordFilter). The communication between the DLL and the service is changed to use localhost (127.0.0.1). The DLL accepts the password on failure (this is a point of discussion for each implementation to ensure you get the behaviour you want). In the event of a service failure, non-compliant passwords are accepted by Active Directory. It is thus possible for workstation-initiated password changes to get rejected by the IDM system. The user would then have one password in Active Directory and their old password will remain in all of the other connected systems (additionally, their IDM password expiry date would not advance, so they’d continue to receive notification of their pending password expiry).

While the DLL has access to the user ID and password, only the password is passed to the service. This means a potential compromise of the service (obtaining a memory dump, for example) will yield only passwords. If the password change occurred at an off time and there’s only one password changed in that timeframe, it may be possible to correlate the password to a user ID (although if someone is able to stack trace or grab memory dumps from our domain controller … we’ve got bigger problems!

The service which performs the filtering has been modified to search the proposed password for any word contained in a text file as a substring. If the case insensitive banned string appears anywhere within the proposed password, the password is rejected and the user gets an error indicating that the password does not meet the password complexity requirements.

Other password requirements (character length, character composition, cannot contain UID, cannot contain given name or surname) are implemented through the normal Microsoft password complexity requirements. This service is purely analyzing the proposed password for case insensitive matches of any string within the dictionary file.

Did you know … that you can recover a deleted Teams channel?

Oh no, I didn’t mean to delete THAT!!! Sure, it asked me five times if I was sure that I was sure … and maybe that’s part of the problem – I see so many “are you sure” messages that I click OK a little too easily. Well, they say to err is human. And I must be exceptionally human ? Sometimes recovering my data requires a sheepish call to the Help Desk. But did you know you can recover deleted Teams channels?

I used the hamburger menu next to a channel to delete it. Oops!

I even read the first few words of the “are you sure” dialogue before clicking the “Delete” button. Except … oops! I didn’t want to delete that channel!

You can recover the channel immediately, all by yourself. Even if you’re not a team owner. From the hamburger menu next to the team, select “Manage team”.

On the Team management page, select “Channels”. You can expand “Deleted” and see the channel you just removed. Click “Restore”

Yet another prompt … click “Restore” again.

Voila, the channel is back. Along with all its content. Whew!

Just because channel recovery is self-service doesn’t mean no one will know that you’ve mis-clicked. The channel deletion event which appears in the “General” channel … well, it’s still there. You can up-vote a request for enhancement on Microsoft’s site … but it’s not like no one will every know about your mistake.  

Did you know … You can control what members of a Microsoft Team group can do within the team?

When you create a new Team, members can create new channels, delete channels, add apps … they can do a lot of things. Did you know much of that is configurable? You can create a Team where individuals receive but cannot respond to posts. You can restrict your Team so only owners can remove channels.

From the hamburger menu next to your Team, select “Manage team”

On the Team management page, select the “Settings” tab.

Expand the “Member permissions” section. Now uncheck any permission you want to restrict to Team owners. There’s even a radio button near the bottom of this section so only Team owners can post to the “General” channel (if that’s the only channel, and members are prohibited from creating their own channels, you’ve got a broadcast-only Team space)

Scroll down and expand “Fun stuff” … you can prevent Gliphy content from being used in the Team (or change the filter used to determine which Gliphy content is appropriate), disable stickers, and disable memes.

Kernel Updates In GNOME

Since I usually do not install X11 ‘stuff’ on my Linux hosts — using the console interface — I do not have any experience installing kernel updates on “desktop” type systems. Evidently, the best practice is to drop out of the GUI into what I’d call init 3 then install the kernel updates. You can get random hangs and malfunctions when you attempt to update the kernel whilst in the graphic console.

Recovering A Seriously Screwed Up Fedora System

The graphical interface on a Fedora 28 laptop was unavailable — buggered up video device/driver. Change to what used to be called run level 3, and we could not log in! We know the root password, but it would not take it. Single user is password protected too — and we were unable to log in there.

Normal recovery process:

Get to the grub menu, highlight the kernel you want to boot, and hit ‘e’ to edit it. Scroll down. On line that starts with linux16, change “rhgb quiet” to say “rd.break enforcing=0”
ctrl-x to boot

Once you get a shell:
mount -o remount,rw /sysroot
chroot /sysroot

Voila, you’ve got access to your files. Use vi to edit whatever has the box seriously screwed up (passwd if your problem is that you don’t know the root password) and you’re set. We reset the root password just in case. Aaaand … we still couldn’t log in on init 1 or init 3! And at this point I was feeling stubborn about getting logged into the box.

Now you can tweak up the system so it is not using sulogin when booting into single user mode but that isn’t a good way to install network-sourced packages. For some reason, we had to disable selinux before we could log into anything other than the graphical target. I’m sure there is a policy we could have tweaked, but it was far easier to disable the thing, boot into the multi-user target, sort the video driver, and then boot into the graphical target.

Debugging An Active Directory Custom Password Filter

A few years ago, I implemented a custom password filter in Active Directory. At some point, it began accepting passwords that should be rejected. The updated code is available at https://github.com/ljr55555/OpenPasswordFilter and the following is the approach I used to isolate the cause of the failure.

 

Technique #1 — Netcap on the loopback There are utilities that allow you to capture network traffic across the loopback interface. This is helpful in isolating problems in the service binary or inter-process communication. I used RawCap because it’s free for commercial use. There are other approaches too – or consult the search engine of your choice.

The capture file can be opened in Wireshark. The communication is done in clear text (which is why I bound the service to localhost), so you’ll see the password:

And response

To ensure process integrity, the full communication is for the client to send “test\n” then “PasswordToTest\n”, after which the server sends back either true or false.

Technique #2 — Debuggers Attaching a debugger to lsass.exe is not fun. Use a remote debugger — until you tell the debugger to proceed, the OS is pretty much useless. And if the OS is waiting on you to click something running locally, you are quite out of luck. A remote debugger allows you to use a functional operating system to tell the debugger to proceed, at which time the system being debugged returns to service.

Install the SDK debugging utilities on your domain controller and another box. Which SDK debugging tool? That’s going to depend on your OS. For Windows 10 and Windows Server 2012 R2, the Windows 10 SDK (Debugging Tools For Windows 10) work. https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools or Google it.

On the domain controller, find the PID of LSASS and write it down (472 in my example). Check the IP address of the domain controller (10.104.164.110 in my example).

From the domain controller, run:

dbgsrv.exe -t tcp:port=11235,password=s0m3passw0rd

Where port=11235 can be any un-used port and password=s0m3passw0rd can be whatever string you want … you’ve just got to use the same values when you connect from the client. Hit enter and you’ve got a debugging server. It won’t look like it did anything, but you’ll see the port bound on netstat

And the binary running in taskman

From the other box, run the following command (substituting the correct server IP, port, password, and process ID):

windbg.exe -y “srv:c:\symbols_pub*http://msdl.microsoft.com/downloads/symbols” -premote tcp:server=10.104.164.110,port=11235,password=s0m3passw0rd -p 472

This attaches your WinDBG to the debugging server & includes an internet-hosted symbol path. Don’t worry when it says “Debugee not connected” at the bottom – that just means the connection has not completed. If it didn’t connect at all (firewall, bad port number, bad password), you’d get a pop-up error indicating that the initial connection failed.

Wait for it … this may take a long time to load up, during which time your DC is vegged. But eventually, you’ll be connected. Don’t try to use the DC yet – it will just seem hung, and trying to get things working just make it worse. Once the debugger is connected, send ‘g’ to the debugger to commence – and now the DC is working again.

Down at the bottom of the command window, there’s a status (0:035> below) followed by a field where you enter commands. Type the letter g in there & hit enter.

The status will then say “Debuggee is running …” and you’re server is again responsive to user requests.

When you reach a failing test, pause the debugger with a break command (Debug=>Break, or Ctrl-Break) which will veg out the DC again. You can view the call stack, memory, etc.

To search the address space for an ASCII string use:

!for_each_module s -[1]a ${@#Base} L?${@#Size}  "bobbob"

Where “bobbob” is the password I had tested.

Alternately, run the “psychodebug” build where LARGEADDRESSAWARE is set to NO and you can search just the low 2-gig memory space (32-bit process memory space):

s -a 0 L?80000000 "bobbob"

* The true/false server response is an ASCII string, not a Boolean. *

Once you have found what you are looking for, “go” the debugger (F5, Debug=>Go, or  ‘g’) to restore the server to an operational state. Break again when you want to look at something.

To disconnect, break and send “qd” to the debugger (quit and detach). If you do not detach with qd, the process being debugged terminates. Having lsass.exe terminate really freaks out the server, and it will go into an auto-recovery “I’m going to reboot in one minute” mode. It’ll come back, but detaching without terminating the process is a lot nicer.

Technique #3 – Compile a verbose version. I added a number of event log writes within the DLL (obviously, it’s not a good idea in production to log out candidate passwords in clear text!). While using the debugger will get you there eventually, half an hour worth of searching for each event (the timing is tricky so the failed event is still in memory when you break the debugger) … having each iteration write what it was doing to the event log was FAAAAAR simpler.

And since I’m running this on a dev DC where the passwords coming across are all generated from a load sim script … not exactly super-secret stuff hitting the event log.

Right now, I’ve got an incredibly verbose DLL on APP556 under d:\tempcsg\ljr\2\debugbuild\psychodebug\ … all of the commented out event log writes from https://github.com/ljr55555/OpenPasswordFilter aren’t commented out.

Stop the OpenPasswordFilter service, put the verbose DLL and executables in place, and reboot. Change some passwords, then look in the event viewer.

ERROR events are actual problems that would show up either way. INFORMATION events are extras. I haven’t bothered to learn how to properly register event sources in Windows yet 🙂 You can find the error content at the bottom of the “this isn’t registered” complaint:

You will see events for the following steps:

DLL starting CreateSocket

About to test password 123paetec123-Dictionary-1-2

Finished sendall function to test password123paetec123-Dictionary-1-2

Got t on test of paetec123-Dictionary-1-2

The final line will either say “Got t” for true or “Got f” for false.

Technique #4 – Running the code through the debugger. Whilst there’s no good way to get the “Notification Package” hook to run the DLL through the debugger, you can install Visual Studio on a dev domain controller and execute the service binary through the debugger. This allows you to set breakpoints and watch variable values as the program executes – which makes it a whole lot easier than using WinDBG to debug the production code.

Grab a copy of the source code – we’re going to be making some changes that should not be promoted to production, so I work on a temporary copy of the project and delete the copy once testing has completed.

Open the project in Visual Studio. Right-click OPFService in the “Solution Explorer” and select “Properties”

Change the build configuration to “Debug”

Un-check “Optimize code” – code optimization is good for production run, but it will wipe out variable values when you want to see them.

Set a breakpoint on execution – on the OPFDictionary.cs file, the loop checking to see if the proposed word is contained in the banned word list is a good breakpoint. The return statements are another good breakpoint as it pauses program execution right before a password test iteration has completed.

Build the solution (Build=>Build Solution). Stop the Windows OpenPasswordFilter service.

Launch the service binary through the debugger (Debug=>Start Debugging).

Because the program is being run interactively instead of through a service, you’ll get a command window that says “Press any key to stop the program”. Minimize this.

From a new command prompt, telnet to localhost on port 5995 (the telnet client is not installed by default, so you may need to use “Turn Windows features on or off” and enable the telnet client first).

Once the connection is established, use CTRL and ] to get into the telnet command prompt. Type set localecho … now you’ll be able to see what you are typing.

Hit enter again and you’ll return to the blank window that is your telnet client. Type test and hit enter. Then type a candidate password and hit enter.

Program execution will pause at the breakpoint you’ve set. Return to Visual Studio. Select Debug =>Window=>Locals to open a view of the variable values

View the locals at the breakpoint, then hit F5 if you want to continue.

If you’re set breakpoints on either of the return statements, program execution will also pause before the return … which gives you an opportunity to see which return is being used & compare the variable values again.

In this case, I submitted a password that was in the banned word list, so the program rightly evaluated line 56 to true and returns true.