Sparse Checkout With Git

I’ve encountered a few repositories that are huge. Unwieldy huge, and stuffed with files that aren’t relevant to what I need. The straight-forward solution is to use multiple repositories — that’s what I do at work with my code samples. There’s a different repo for each language because the PHP developers really don’t care what the C# code looks like. The Java developers don’t need a copy of the Python code. But there are advantages to having a single repository that may preclude you from taking the simple solution. Git sub-modules are an interesting approach — combining multiple repositories into a single functional unit. But that’s a pretty big change to an existing repo. And, if you participate in open source projects, it may not be your decision anyway.

There’s another option for selectively cloning when you’re working with a large repo — an option that doesn’t require any changes to the repository. An end user can perform a sparse checkout — essentially use a filter like .gitignore to select or deselect certain files/folders from being pulled into the local working directory. The file is named sparse-checkout and is located in .git\info — unlike a .gitignore file which indicates what shouldn’t get included, sparse-checking controls what is included (if you want an entire repo except one folder, use !path/to/folder/**)

The sparse-checkout file used to get just the core components of Scott’s OpenHAB helper libraries plus the OpenWeatherMap community scripts is:

.github/**
Core/**
Community/OpenWeatherMap/**

To use sparse checkout, set the core.sparseCheckout config value to true. You can add sparse checkout to a repo you’ve already cloned and use

git read-tree -mu HEAD

to “clean up” unwanted files. Or you can set up sparse checkout before you clone the repo

D:\tmp>mkdir ljrtest

D:\tmp>cd ljrtest

D:\tmp\ljrtest>git init
Initialized empty Git repository in D:/tmp/ljrtest/.git/

D:\tmp\ljrtest>git remote add origin https://github.com/openhab-scripters/openhab-helper-libraries

D:\tmp\ljrtest>git config core.sparseCheckout true

D:\tmp\ljrtest>copy ..\sparse-checkout .git\info\
1 file(s) copied.

D:\tmp\ljrtest>git pull origin master
remote: Enumerating objects: 3591, done.
remote: Total 3591 (delta 0), reused 0 (delta 0), pack-reused 3591R ), 7.00 MiB | 6.95 MiB/s
Receiving objects: 100% (3591/3591), 9.26 MiB | 7.22 MiB/s, done.
Resolving deltas: 100% (1786/1786), done.
From https://github.com/openhab-scripters/openhab-helper-libraries
* branch master -> FETCH_HEAD
* [new branch] master -> origin/master

D:\tmp\ljrtest>dir
Volume in drive D is DATA
Volume Serial Number is D8E9-3B61

Directory of D:\tmp\ljrtest

07/03/2019 09:07 AM <DIR> .
07/03/2019 09:07 AM <DIR> ..
07/03/2019 09:07 AM <DIR> .github
07/03/2019 09:07 AM <DIR> Community
07/03/2019 09:07 AM <DIR> Core
0 File(s) 0 bytes
5 Dir(s) 386,515,042,304 bytes free

D:\tmp\ljrtest>dir .\Community
Volume in drive D is DATA
Volume Serial Number is D8E9-3B61

Directory of D:\tmp\ljrtest\Community

07/03/2019 09:07 AM <DIR> .
07/03/2019 09:07 AM <DIR> ..
07/03/2019 09:07 AM <DIR> OpenWeatherMap
0 File(s) 0 bytes
3 Dir(s) 386,515,042,304 bytes free

Using sparse checkout, no one else has to do anything. Configure your client to get the files you want, and you’re set.

 

Leave a Reply

Your email address will not be published. Required fields are marked *