Tag: regex

November 24, 2020

VSCode Search/Replace Using Regex Capture Groups

Regex adds a lot of flexibility to search/replace operations. Capture groups allow you to manipulate the found data in the replacement text. As an example, I have simple mathematical equations that are not spaced out reasonably. I could replace “+” with ” + “, “-” with ” – “, “*” with ” * “, “/” with ” / “, and “=” with ” = “, but using a capture group to identify non-whitespace characters and the range of operators allows a single search/replace operation to add spaces to my equations.

Selecting the regex option (in blue below), I can use the regular expression (\S+)([\+,\-,\*,\/])(\S+)=(\S+) as my search string. This means the first capture group is one or more non-whitespace characters, the second capture group is one of the characters +,-,*,/, the third capture group is one or more non-whitespace characters, there’s an equal sign (which I could make into a fourth capture group), and the fourth capture group is one or more non-whitespace characters.

An alternate regex finds zero or more whitespace characters — (\S*)([\+,\-,\*,\/])(\S*)=(\S*)

The replacement text then uses each capture group — $1 $2 $3 = $4 — to add spaces around the operators and around the equal sign.

January 27, 2020

OpenHAB CalDav Personal Binding – Item Name Filtering

We use the CalDav Personal binding to select items from our Exchange calendar to populate date/time OpenHAB Items — when does Anya have her next gymnastics class, when is the next Trustee meeting, etc. When we had first set this up, we were using manually created appointments. The appointments were assigned unique categories so the binding could determine which appointment should be used to update the Item. I’ve since started creating calendar items based on published calendars (so far a Google calendar and a SchoolPointe calendar), but the Python module for interacting with Exchange cannot assign a category to the appointments it creates.

The binding allows you to filter on the appointment subject (‘name’) using a regex. The binding documentation says to use name-filter:’\<Some Filter\>’ which … well, doesn’t work. We tried omitting the back-slashes in case they were meant to be escape characters. We tried omitting the greater and less-than symbols in case those were meant in the way I often use them, to designate <the part you replace>. Still doesn’t work. We tried using forward-slashes instead of backslashes because that’s the normal regular expression syntax. Nope. We tried adding a ‘starts with’ ^, trailing .* to ensure it would match anything that started with what we wanted. Nope. We’d alternately match all of the appointments or none.

Consulting the source, there is a very restrictive character set available in your name filter regex. The binding uses Java’s Matcher with a regex to extract your regex from the item configuration. You need to have filter-name: then you may have a single quote (‘? means 0 or 1 of ‘). This is Followed by one or more characters from the class which is all upper and lower case letters A-Z, a full stop, an asterisk, a plus sign, a minus sign, a space, and a pipe bar. Then you may have another single quote. The bit with one or more characters from the restricted class is extracted — this is how the binding gets your regex from the item config.

private static final String REGEX_FILTER_NAME ="filter-name:'?([A-Za-z\\.\\*\\+\\- \\|]+)'?";

Using unsupported characters in your filter-name regex alternately match all appointments or none. Using filter-name:’\<test>\’ (as the documentation literally instructs me to do) doesn’t return a match with anything as + requires one or more matches from the character set. I have zero such characters after the opening single quote. Similarly filter-name:’^Beginning of string.*’ doesn’t return a match. It appears that, in cases where the name filter is null … all items are matched. Explains why we were getting the same appointment’s details posted into each item.

On the other extreme — a filter like filter-name:’Pick up dry-cleaning at 1 Main Street’ will truncate your regular expression at the number character. The extracted matched group is Pick up dry-cleaning at … which won’t match anything unless you actually have an appointment titled “Pick up dry-cleaning at ” with a trailing space. I’ve seen posts on the OpenHAB forum where individuals have non-English words in their match … filter-name:’Trip to Askøy’ which, again, match nothing since the actual regex used by the binding is Trip to Ask The same thing happens when looking for character classes (i.e. I don’t know if this will be capitalized, so I want to match [Tt]est).

The solution, since a question mark isn’t an option, is to use a plus or splat to replace any character that isn’t supported by the binding. Using a plus ensures there’s something where you expect the character to occur, although the * is a broader match (we use “Township: Event Name”, but I don’t need the colon to successfully match my item. “Township Event Name” would match. I could even use a different delimiter as “Township, Event Name” would also match). Where you are unsure of the case, you need to use a pipebar (e.g. filter-name:’Test|test’)

The Items that are populated with the start time and event name for the next Township meeting look like this:

DateTime Calendar_Upcoming_Township "Upcoming Township meeting (start) [%1$tA, %1$tB %1$te, %1$tY at %1$tl:%1$tM%1$tp]" <calendar> (gCalendar) {caldavPersonal="calendar:ourcalendar type:UPCOMING eventNr:1 value:START filter-name:'Township. .*'"}
String Calendar_Upcoming_Township_Title "Upcoming Township meeting [%s]" <calendar> (gCalendar) {caldavPersonal="calendar:ourcalendar type:UPCOMING eventNr:1 value:NAME filter-name:'Township. .*'"}

And the calendar events titled “Township: Trustee Regular Meeting” or “Township: Craft Fair” are all identified by the filter.

Note: Scott submitted a PR to change the regex used to extract your filter-name regex. Once this change gets merged, you’ll be able to use character sets (e.g. [T|t]est), numbers, and ‘special’ characters excluding the single quote. Including the single quotes around the filter will be required.

January 1, 2020

Extracting Waste Stream Collection Dates for the Netherlands

Yeah … mostly saving this for the regex search with a start and end flag that spans newlines because I don’t really need to know the date they collect each waste stream in the Netherlands. Although it’s cool that they’ve got five different waste streams to collect.

import requests
import re

strBaseURL = 'https://afvalkalender.waalre.nl/adres/<some component of your address in the Netherlands>' 
iTimeout = 600 
strHeader = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}

# Start and end flags for waste stream collection schedule content
START = '<ul id="ophaaldata" class="line">' 
END = '</ul>' 

page = requests.get(strBaseURL, timeout=iTimeout, headers=strHeader)
strContent = page.content
strContent = strContent.decode("utf-8")

result = re.search('{}(.*?){}'.format(START, END), strContent, re.DOTALL)
strCollectionDateSource = result.group(1)

resultWasteStreamData = re.findall('<li>(.*?)</li>', strCollectionDateSource, re.DOTALL)
for strWasteStreamRecord in resultWasteStreamData:
    listWasteStreamRecord = strWasteStreamRecord.split("\n")
    strDate = listWasteStreamRecord[3]
    strWasteType = listWasteStreamRecord[4]
    print("On {}, they collect {}".format(strDate.strip().replace('<i class="date">','').replace('</i>',''), strWasteType.strip().replace('<i>','').replace('</i>','')))