Tag: git

Updating git on Windows

There’s a convenient command to update the Windows git binary

git update-git-for-windows

Which is useful since ADO likes to complain about old git clients —

remote: Storing packfile... done (51 ms)
remote: Storing index... done (46 ms)
remote: We noticed you're using an older version of Git. For the best experience, upgrade to a newer version.

Git – Removing Confidential Info From History

The first cut of code may contain … not best practice code. Sometimes this is just hard coding something you’ll want to compute / look up in the future. Hard coding user input isn’t a problem if my first cut is always searching for ABC123. Hard coding the system creds? Not good. You sort that before you actually deploy the code. But some old iteration of the file has MyP2s5w0rD sitting right there in plain text. That’s bad in a system that maintains file history! The quick/easy way to clean up passwords stashed within the history is to download the BFG JAR file.

For this test, I created a new repository in .\source then created three clones of the repo (.\clone1, .\clone2, and .\clone3). In each cloneX copy, I created a tX folder that has a file named ldapAuthTest.py — a file that contains a statically assigned password as

strSystemAccountPass = "MyP2s5w0rD"

The first thing I did was to redact the password in the files — this means anyone looking at HEAD won’t see the password. Source, clone1, and clone2 are all current. The clone3 copy has pulled all changes but has a local change committed but not merged.

To clean the password from the git history, first create a backup of your repo (just in case!). Then mirror the repo to work on it

mkdir mirror
cd mirror
git clone --mirror d:\git\testFilterBranch\source

 

Create file .\replacements.txt with the string to be redacted — in this case:

strSystemAccountPass = "MyP2s5w0rD"

Formatting notes for replacements.txt

MyP2s5w0rD # Replaces string with default ***REMOVED***
MyP2s4w0rD==>REDACTED # Replaces string using custom string REDACTED
MyP2s3w0rD==> # Replaces string with null -- i.e. removes the string
regex:strSystemAccountPass\s?=\s?"MyP2s2w0rD""==>REDACTED # Uses a regex match -- in this case we may or may not have a space around the equal sign

So, in my mirror folder, I have the replacement.txt file which defines which strings are replaced. I have a folder that contains the mirror of my repo.

lisa@FLEX3 /cygdrive/d/git/testFilterBranch/mirror
$ ls
replacements.txt source.git

To replace my “stuff”, run bfg using the –replace-text option. Because I only want to replace the text in files named ldapAuthTest.py, I also added the -fi option

java -jar ../bfg-1.14.0.jar --replace-text ..\replacements.txt -fi ldapAuthTest.py source.git

 

lisa@FLEX3 /cygdrive/d/git/testFilterBranch/mirror
$ java -jar ../bfg-1.14.0.jar --replace-text replacements.txt -fi ldapAuthTest.py source.git

Using repo : D:\git\testFilterBranch\mirror\source.git

Found 3 objects to protect
Found 2 commit-pointing refs : HEAD, refs/heads/master

Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 87f1b398 (protected by 'HEAD')

Cleaning
--------
Found 5 commits
Cleaning commits: 100% (5/5)
Cleaning commits completed in 613 ms.

Updating 1 Ref
--------------

Ref Before After
---------------------------------------
refs/heads/master | 87f1b398 | 919c8f0f

Updating references: 100% (1/1)
...Ref update completed in 151 ms.

Commit Tree-Dirt History
------------------------

Earliest Latest
| |
. D D D m

D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)

Before After
-------------------------------------------
First modified commit | dc2cd935 | 8764f6f1
Last dirty commit | 9665c4e0 | ccdf0359

Changed files
-------------

Filename Before & After
-------------------------------------
ldapAuthTest.py | 25e79fa6 ? 4d12fdad

In total, 8 object ids were changed. Full details are logged here:
D:\git\testFilterBranch\mirror\source.git.bfg-report\2021-06-23\12-50-00

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

Check to make sure nothing looks abjectly wrong. Assuming the repo is sound, we’re ready to clean up and push these changes.

cd source.git

git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push

 

Pulling the update from my source repo, I have merge conflicts

These are readily resolved and the source repo can be merged into my local copy.

And the change I had committed but not pushed is still there.

Pushing that change produces no errors

Now … pushing the bfg changes may not work. In my case, the real repo has a bunch of branchs and I am getting “non fast-forward merges”. To get the history changed, I need to do a force push. Not so good for the other developers! In that case, everyone should get their changes committed and pushed. The servers should be checked to ensure they are up to date. Then the force push can be done and everyone can pull the new “good” data (which, really, shouldn’t differ from the old data … it’s just the history that is being tweaked).

List Extensions Within Folder

It didn’t occur to me that Apache serves everything under a folder and the .git folder may well be under a folder (you can have your project up a level so there’s a single folder at the root of the project & that folder is DocumentRoot for the web site). Without knowing specific file names, you cannot get anything since directory browsing is disabled. But git has a well-known structure so browsing to /.git/index or really scary for someone who stuffs their password in the repo URL /.git/config is there and Apache happily serves it unless you’ve provided instructions otherwise.

A coworker brought up the intriguing idea of, instead of blocking the .git folder so things subordinate to .git are never served, having a specific list of known good extensions the web server was willing to serve. Which … ironically was one of the things I really didn’t like about IIS. Kind of like the extra frustration of driving behind someone who is going the speed limit. Frustrating because I want to go faster, extra frustrating because they aren’t actually wrong.

But configuring a list of good-to-serve extensions means you’ve got to get a handle on what extensions are on your server in the first place. This command provides a list of extensions and a count per extension (so you can easily identify one-offs that may not be needed):

find /path/to/search/ -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort | uniq -c

 

Tar Excluding Git Folders

You can, of course, use –exclude and avoid adding the .git folders to your tar archive, but I discovered a really cool option that excludes the folders created by a whole host of version control systems:

--exclude-vcs

Which, as of version 1.32, means excluding CVS, RCS, SCCS, git, SVN, Arch, Bazaar, Mercurial, and Darcs as follows:

  • CVS/ — recursive
  • RCS/ — recursive
  • SCCS/ — recursive
  • .git/ — recursive
  • .gitignore
  • .gitmodules
  • .gitattributes
  • .cvsignore
  • .svn/ — recursive
  • .arch-ids/ — recursive
  • {arch}/ — recursive
  • =RELEASE-ID
  • =meta-update
  • =update
  • .bzr
  • .bzrignore
  • .bzrtags
  • .hg
  • .hgignore
  • .hgrags
  • _darcs

Git — Show Dates For Branches

We wanted to see all of the branches in a repository with the dates of the latest commit:

git for-each-ref --sort=committerdate refs/heads/ --format='%(committerdate:rfc-local) %(refname:short)'

This outputs the full date/time (you can use any of the git log date formats [relative|local|default|iso|rfc|short|raw] after committerdate; short for the date without time)

 

Git: Using Soft Reset To Clean Up Un-pushed Commits

I missed a file when I was cleaning up debugging lines. I made the change and included it in a second commit, but I’d rather not have two commits for the same purpose. I hadn’t pushed my changes yet, so these commits only exist on my workstation … which means I can reset and bundle the changes into a single commit.

Find commit number that is one before the duplicate debug logging cleanup — this is the point to which you want to reset. In my case, it is the commit start with b443348c

Reset there with “–soft” — this doesn’t change anything on the file system (i.e. I don’t have to clean up those debug lines again) but puts the changes back into the staging area.

Now those files are staged again, so I can make a single commit for removing debug logging from my code.

Voila! I can push these changes and not clutter our history with my error.

 

Preventing erronious use of the master branch on development servers

One of the web servers at work uses a refspec in the “git pull” command to map the remote development branch to the local remote-tracking master branch. This is fairly confusing (and it looks like the dev server is using the master branch unless you dig into how the pull is performed), but I can see how this prevents someone from accidentally typing something like “git checkout master” and really messing up the development environment. I can also see a dozen ways someone can issue what is a completely reasonable git command 99% of the time and really mess up the development environment.

While it is simple enough to just checkout the development branch, doing so does open us up to the possibility that someone will erroneously  deliver the production code to the development server and halt all testing. While you cannot create shell aliases for multi-word commands (or, more accurately, alias expansion is performed for the first word of a simple command is checked to see if it has an alias … so you’ll never get the multi-word command), you can define a function to intercept git commands and avoid running unwanted commands:

function git() { 
     case $* in 
         "checkout master" ) command echo "This is a dev server, do not checkout the master branch!" ;; 
         "pull origin master" ) command echo "This is a dev server, do not pull the master branch" ;; 
         * ) command git "$@" ;; 
     esac
}

Or define the desired commands and avoid running any others:

function git(){
     if echo "$@" | grep -Eq '^checkout uat$'; then
          command git $@
     elif echo "$@" | grep -Eq '^pull .+ uat$'; then
          command git $@
     else
          echo "The command $@ needs to be whitelisted before it can be run"
     fi
}

Either approach mitigates the risk of someone incorrectly using the master branch on the development server.

Reverting a Single File with Git

Git revert is great for resetting the entire project to a particular state – I went down a bad path, really don’t want to do this, and resetting to the state I was in this morning is exactly what I want to do. Sometimes, though … that’s not the case. I added a couple of debugging lines to a file that I don’t really need. Or I’ve gone down a bad path here but have good work in a few other files too. In those cases, you can revert a single file to the latest committed version. Run “git status” and “git diff” to confirm that it is an uncommited change.

To revert a single file to its latest committed state, use “git checkout – filename” – you can see the added line has disappeared.

 

Git Log

Git log can be used to get a quick summary of the differences between two branches. The three dots between the branch names indicates you want a “symmetric difference”. That is the set of commits that are in the branch on the left or the branch on the right but not in both branches.

The –left-right option prefixes each commit with an ‘<’ or ‘>’ indicating which “side” has the commit. The –oneline option prints the abbreviated commit ID and the first line of the commit message.

Showing the differences between your local uat branch and the remote uat branch:

D:\git\gittest>git log –left-right –oneline origin/uat…uat

> 961f53a (uat) Merge branch ‘ossa-123’ into uat

> 803096b (origin/ossa-123, ossa-123) Added additional files

> cf9c419 Added initial code to branch

The top line is the most recent commit, the bottom line is the oldest commit that does not exist in both branches. I can see that the uat branch in my local repo is not missing anything from the remote (there are no commits with “<” indicating changes in the remote that do not exist in my local copy) but I have local changes which have not yet been pushed: two code commits plus the merge commit which incorporated the code commits to my local repo’s uat branch. The head of the local and remote ossa-123 branch are at the commit just prior to the merge, so on my local repo that branch has been fully merged into UAT and I just need to push uat up to the remote.

Additional options to enhance output:

–cherry-pick will omit any changes that represent the same changes in both branches (or –cherry-mark to mark those commits with an “=” flag)

–graph uses an ASCII chart to depict branch relationships.

* The three dots mean something different in git diff than in git log. In git diff, mean “what are the differences between the right-hand branch and the common ancestor shared by both the right and left-hand branches”.

Two dots in git diff mean is the differences that are in the branch on the left or the branch on the right but not in both branches.

In git log, two dots displays only commits unique to the second branch. Since commits and differences are not exactly the same thing, two and three dots don’t exactly have the opposite meaning between diff and log. But the meaning is not logically consistent.