Category: Coding

PHP oci_bind_by_name Caveat (a.k.a. a reminder to read the documentation)

This is readily apparent when you actually read the documentation for oci_bind_by_name … but, if you quickly skim through the doc and get mostly what it’s telling you, you can lose data when implementing oci_bind_by_name in your code. That’s because the variable that gets bound into the name is a reference. It’s right there in the description

And again in the more verbose part of the description

Well? They really mean It!

I usually iterate through arrays with for $x=0; $x<count($arrayOfData);$x++ — which is fine because $arrayOfData[$x] is a valid location in memory that contains the data I want. I had a program, though, that accepted user input. Input that frequently contains a bunch of extraneous newline characters in the middle of the data. To avoid making the users clean up their input, I just threw an array_filter on the input. But I don’t need the reorganized data, so I didn’t “waste” the cycles to re-index the filtered array. Instead, I used

$iIterator=0;
for( $itemInput in array_filter( $arrayOfData ) ) {
     oci_bind_by_name($stmt, ‘:placeholder’.$iIterator++, $itemInput);
}

Which is fine in most circumstances. But, with oci_bind_by_name … the memory address where $inputItem happened to be stashed is bound to the names :placeholder0, :placeholder1, …, :placeholdern – that memory address happened to still contain the last element of arrayOfData I happened to stash there because the server isn’t so busy that memory is being reallocated a fraction of a second later. But this loop does not bind each element of arrayOfData to its corresponding placeholder — instead of saying “select * from MyTable where ID IN (1,2,3,4,5)” … I had “select * from MyTable where ID IN (5,5,5,5,5)”.

In this case you need to use the array index

for( $x = 0; $x < count( $arrayOfData ); $x++ ) {
     $strPlaceholderName = ":inputclli".$x;
     oci_bind_by_name($stmt, $strPlaceholderName, $arrayOfData[$x]);
}

So each unique memory address is bound to a placeholder.

PHP Curl and Mangled Headers

I have PHP code that calls out to a SOAP endpoint, gets the response XML, and then does stuff. Well … that was the plan anyway. What actually happened was nothing. No error set, but the response string was null. Even adding curl debugging produced nothing. Fortunately, the endpoint is a sandbox one and thus available on http without encryption. I was going to do a network trace, so I needed to run the script from my computer. Aaaand STDOUT is where the curl debugging was going, not (as I assumed) STDERR. And look, there actually was an error. 400 bad request (how did that not come through to curl_error?!?) — and it became immediately obvious what my problem was — the content length is a quoted string instead of an integer.

Sigh! Copy/paste error where I built out my header. Cleared the escaped quotes from around the strlen call and, voila, my code works perfectly.

But a note to my future self — when PHP’s curl calls aren’t producing any output … try running it from the CLI. Or remember to add the instruction to have verbose output in STDERR!

curl_setopt($ch, CURLOPT_STDERR, $verbose);

 

ADO – Migrating a Repository to Azure Repos (and keeping your commit history)

The most direct way to migrate a repo into Azure Repos is to create a new, blank repository. This may mean making a new project. From the organization’s main page, click “New project”

Or it may mean making a new repo in an existing project. From an existing repo, click the drown-down next to the repo name and select “New repository”

Name the repository but don’t add a README. We want a blank repo

Note the URL to the repository – in this case, it’s https://ado0255@dev.azure.com/ado0255/History%20Test/_git/Another%20History%20Test

Find the URL for your existing Git repo – if you cd into the project’s folder and run “git remote -v”, you will get a list of the repos. Make a new folder somewhere – this is a temporary staging area to move the data from your existing repo over to the new Azure Repo. Change directories into your new folder. Run git clone –mirror URLToOldRepo

You will see data being downloaded from your git server.

Change directories into the folder that just got downloaded. You won’t see your code like you normally do when you clone a git repo. You’re looking at the underlying git stuff that makes up the repo. You’re code is all in there, as are all of the branches and commit history.

Now add the new Azure Repo as a remote – in this case, I’m naming the remote “ado”. Then run “git push ado –all” to push everything up to the new Azure Repo.

Stuff will transfer – you may be prompted to log into your ADO repository first. Eventually, you’ll see new branches being created on the remote and the process will complete.

Refreshing the Azure Repo, you’ll now see the files.

Selecting “Commits” will display the commit history.

Anyone else using the repo will need to add the new remote. Use “git remote rm origin” to remove the existing origin, then use “git remote add origin url” to add the new Azure Repo as origin.

ADO – Cleaning up test repos and projects

I find the process to delete repositories and projects to be nonintuitive. Since I create a lot of projects and repos for testing and documentation, it’s nice to be able to clean them up when I’m done!

To delete an Azure Repo, navigate to a repo and select the drop-down next to the repo name. Select “Manage repositories”

With your mouse over a repository, there’s a hamburger menu at the right-hand side of the listing. Click it and select “Delete”

You’ll need to type the repository’s full name to activate the delete button.

To delete a project, go to the organization’s home page and select “Organization Settings” from the lower left-hand corner of the screen.

Select “Projects” from the left-hand navigation bar

With your mouse over the project listing, you’ll have a hamburger menu. Click it and select “Delete Project”

You’ll need to type the project name to activate the delete button.

 

Git – Removing Confidential Info From History

The first cut of code may contain … not best practice code. Sometimes this is just hard coding something you’ll want to compute / look up in the future. Hard coding user input isn’t a problem if my first cut is always searching for ABC123. Hard coding the system creds? Not good. You sort that before you actually deploy the code. But some old iteration of the file has MyP2s5w0rD sitting right there in plain text. That’s bad in a system that maintains file history! The quick/easy way to clean up passwords stashed within the history is to download the BFG JAR file.

For this test, I created a new repository in .\source then created three clones of the repo (.\clone1, .\clone2, and .\clone3). In each cloneX copy, I created a tX folder that has a file named ldapAuthTest.py — a file that contains a statically assigned password as

strSystemAccountPass = "MyP2s5w0rD"

The first thing I did was to redact the password in the files — this means anyone looking at HEAD won’t see the password. Source, clone1, and clone2 are all current. The clone3 copy has pulled all changes but has a local change committed but not merged.

To clean the password from the git history, first create a backup of your repo (just in case!). Then mirror the repo to work on it

mkdir mirror
cd mirror
git clone --mirror d:\git\testFilterBranch\source

 

Create file .\replacements.txt with the string to be redacted — in this case:

strSystemAccountPass = "MyP2s5w0rD"

Formatting notes for replacements.txt

MyP2s5w0rD # Replaces string with default ***REMOVED***
MyP2s4w0rD==>REDACTED # Replaces string using custom string REDACTED
MyP2s3w0rD==> # Replaces string with null -- i.e. removes the string
regex:strSystemAccountPass\s?=\s?"MyP2s2w0rD""==>REDACTED # Uses a regex match -- in this case we may or may not have a space around the equal sign

So, in my mirror folder, I have the replacement.txt file which defines which strings are replaced. I have a folder that contains the mirror of my repo.

lisa@FLEX3 /cygdrive/d/git/testFilterBranch/mirror
$ ls
replacements.txt source.git

To replace my “stuff”, run bfg using the –replace-text option. Because I only want to replace the text in files named ldapAuthTest.py, I also added the -fi option

java -jar ../bfg-1.14.0.jar --replace-text ..\replacements.txt -fi ldapAuthTest.py source.git

 

lisa@FLEX3 /cygdrive/d/git/testFilterBranch/mirror
$ java -jar ../bfg-1.14.0.jar --replace-text replacements.txt -fi ldapAuthTest.py source.git

Using repo : D:\git\testFilterBranch\mirror\source.git

Found 3 objects to protect
Found 2 commit-pointing refs : HEAD, refs/heads/master

Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 87f1b398 (protected by 'HEAD')

Cleaning
--------
Found 5 commits
Cleaning commits: 100% (5/5)
Cleaning commits completed in 613 ms.

Updating 1 Ref
--------------

Ref Before After
---------------------------------------
refs/heads/master | 87f1b398 | 919c8f0f

Updating references: 100% (1/1)
...Ref update completed in 151 ms.

Commit Tree-Dirt History
------------------------

Earliest Latest
| |
. D D D m

D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)

Before After
-------------------------------------------
First modified commit | dc2cd935 | 8764f6f1
Last dirty commit | 9665c4e0 | ccdf0359

Changed files
-------------

Filename Before & After
-------------------------------------
ldapAuthTest.py | 25e79fa6 ? 4d12fdad

In total, 8 object ids were changed. Full details are logged here:
D:\git\testFilterBranch\mirror\source.git.bfg-report\2021-06-23\12-50-00

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

Check to make sure nothing looks abjectly wrong. Assuming the repo is sound, we’re ready to clean up and push these changes.

cd source.git

git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push

 

Pulling the update from my source repo, I have merge conflicts

These are readily resolved and the source repo can be merged into my local copy.

And the change I had committed but not pushed is still there.

Pushing that change produces no errors

Now … pushing the bfg changes may not work. In my case, the real repo has a bunch of branchs and I am getting “non fast-forward merges”. To get the history changed, I need to do a force push. Not so good for the other developers! In that case, everyone should get their changes committed and pushed. The servers should be checked to ensure they are up to date. Then the force push can be done and everyone can pull the new “good” data (which, really, shouldn’t differ from the old data … it’s just the history that is being tweaked).

On UUIDs

RFC 4122 UUID Versions:

1 — Datetime and MAC based
48-bit MAC address, 60-bit timestamp, 13-14 bit uniquifying sequence

2 — Datetime and MAC based with DCE security
8 least significant clock sequence numbers and least significant 32 bits of timestamp. RFC doesn’t reallly provide details on DCE security

3 — Hashed Namespace
MD5 hash of namespace

4 — Random
6 pre-determined bits (4 bits for version, 2-3 bits for variant 1 or 2) and 122 bits for 2^122 possible v4 variant 1 UUIDs

5 — Hashed Namespace
SHA-1 hash of namespace

In my case, I hesitate to use a v1 or v2 UUID because I have scripts executing in cron on the same host. The probability of the function being called at the same microsecond time seems higher than the pseudo-random number generator popping the same value in the handful of hours for which the UUIDs will be persisted for deduplication.

v3 or v5 UUIDs are my fallback position if we’re seeing dups in v4 — the namespace would need to glom together the script name and microsecond time to make a unique string when multiple scripts are running the function concurrently.

Kafka Troubleshooting (for those who enjoy reading network traces)

I finally had a revelation that allowed me to definitively prove that I am not doing anything strange that is causing duplicated messages to appear in the Kafka stream — it’s a clear text protocol! That means you can use Wireshark, tcpdump, etc to capture everything that goes over the wire. This shows that the GUID I generated for the duplicated message only appears one time in the network trace. Whatever funky stuff is going on that makes the client see it twice? Not me 😊

I used tcpdump because the batch server doesn’t have tshark (and it’s not my server, so I’m not going to go requesting additional binaries if there’s something sufficient for my need already available). Ran tcpdump -w /srv/data/ljr.cap port 9092 to grab everything that transits port 9092 while my script executed. Once the batch completed, I stopped tcpdump and transferred the file over to my workstation to view the capture in Wireshark. Searched the packet bytes for my duplicated GUID … and there’s only one.

Confluent Kafka Queue Length

The documentation for the Python Confluent Kafka module includes a len function on the producer. I wanted to use the function because we’re getting a number of duplicated messages on the client, and I was trying to isolate what might be causing the problem. Unfortunately, calling producer.len() failed indicating there’s no len() method. I used dir(producer) to show that, no, there isn’t a len() method.

I realized today that the documentation is telling me that I can call the built-in len() function on a producer to get the queue length.

Code:

print(f"Before produce there are {len(producer)} messages awaiting delivery")
producer.produce(topic, key=bytes(str(int(cs.timestamp) ), 'utf8'), value=cs.SerializeToString() )
print(f"After produce there are {len(producer)} messages awaiting delivery")
producer.poll(0) # Per https://github.com/confluentinc/confluent-kafka-python/issues/16 for queue full error
print(f"After poll0 there are {len(producer)} messages awaiting delivery")

Output:

Before produce there are 160 messages awaiting delivery
After produce there are 161 messages awaiting delivery
After poll0 there are 155 messages awaiting delivery

Boolean Opts in Python

I have a few command line arguments on a Python script that are most readily used if they are boolean. I sometimes need a “verbose” option for script debugging — print a lot of extra stuff to show what’s going on, and I usually want a “dry run” option where the script reads data, performs calculations, and prints results to the screen without making any changes or sending data anywhere (database, email, etc). To use command line arguments as boolean values, I use a function that converts a variety of possible inputs to True/False.

def string2boolean(strInput):
    """
    :param strInput: String string to be converted to boolean
    :return: Boolean representation of input
    """
    if isinstance(strInput, bool):
        return strInput
    if strInput.lower() in ('yes', 'true', 't', 'y', '1'):
        return True
    elif strInput.lower() in ('no', 'false', 'f', 'n', '0'):
        return False
    else:
        raise argparse.ArgumentTypeError('Boolean value expected.')

Use “type” when adding the argument to run the input through your function.

    parser.add_argument('-r', '--dryrun', action='store', type=string2boolean, dest='boolDryRun', default=False, help="Preview data processing without sending data to DB or Kafka. Valid values: 'true' or 'false'.")

Oracle – Collections and IN or LIKE Queries

I’ve been retrofitting a lot of PHP/SQL queries to use oci_bind_by_name recently. When using “IN” clauses, you can iterate through your array twice, but it’s an inefficient approach.

// Build the query string with a bunch of placeholders
$strQuery = "select Col1, Col2, Col3 from TableName where ColName IN (";
for($i=0; $i < count($array); $i++){
    if($i > 0){
        $strQuery = $strQuery . ", ";
    }
    $strQuery = $strQuery . ":bindvar" . $i;
}
$strQuery = $strQuery . ")";
... 
// Then bind each placeholder to something
for($i=0; $i < count($array); $i++){
    oci_bind_by_name($stmt, ":bindvar".$i, $array[$i]);
}

Building a table from the array data and using an Oracle collection object creates cleaner code and avoids a second iteration of the array:

$strQuery = "SELECT indexID, objName FROM table WHERE objName in (SELECT column_value FROM table(:myIds))";
$stmt = oci_parse($conn, $strQuery);

$coll = oci_new_collection($kpiprd_conn, 'ODCIVARCHAR2LIST','SYS');
foreach ($arrayValues as $strValue) {
     $coll->append($strValue);
}
oci_bind_by_name($stmt, ':myIds', $coll, -1, OCI_B_NTY);
oci_set_prefetch($stmt, 300);
oci_execute($stmt);

A simple like clause is quite straight-forward

$strNameLikeString = "SomeName%";
$strQuery = "SELECT ds_dvrsty_set_nm from ds_dvrsty_set WHERE ds_dvrsty_set_nm LIKE :divsetnm ORDER BY ds_dvrsty_set_nm DESC fetch first 1 row only";

$stmt = oci_parse($connDB, $strQuery);
oci_bind_by_name($stmt, ":divsetnm", $strNameLikeString);
oci_set_prefetch($stmt, 300);
oci_execute($stmt);

But what about an array of inputs essentially reproducing the LIKE ANY predicate in PostgreSQL? There’s not a direct equivalent in Oracle, and iterating through the array twice to build out a query WHERE (Field1 LIKE ‘Thing1%’ OR Field1 LIKE ‘Thing2%’ OR Field1 LIKE ‘Thing3%’) is undesirable. The with EXISTS allows me to create a LIKE ANY type query and only iterate through my array once to bind variables to placeholders using the same collection approach as was used with the IN clause.

$arrayLocs = array('ERIEPAXE%', 'HNCKOHXA%', 'LTRKARXK%');
$strQuery = "SELECT location_id, clli_code FROM network_location WHERE EXISTS (select 1 FROM TABLE(:likelocs) WHERE clli_code LIKE column_value)";
$stmt = oci_parse($connDB, $strQuery);

$coll = oci_new_collection($connDB, 'ODCIVARCHAR2LIST','SYS');
foreach ($arrayLocs as $strLocation) {
    $coll->append($strLocation);
}
oci_bind_by_name($stmt, ':likelocs', $coll, -1, OCI_B_NTY);
oci_execute($stmt);
print "<table>\n";
print "<tr><th>Loc ID</th><th>CLLI</th></tr>\n";
while ($row = oci_fetch_array($stmt, OCI_ASSOC+OCI_RETURN_NULLS)) {
    print "<tr><td>" . $row['LOCATION_ID'] . "</td><td>" . $row['CLLI_CODE'] . "</td></tr>\n";
}
print "</table>\n";

There are many different collection types in Oracle which can be used with oci_new_collection. A full list of the system collection types can be queried from the database.

SELECT * FROM SYS.ALL_TYPES WHERE TYPECODE = 'COLLECTION' and OWNER = 'SYS';