git-lfs - Git For Big Files

eric | Feb. 25, 2020, 4:08 p.m.

Git puts the upper file size limit at 100 MB. So what can you do if you have files bigger than that? Git Large File Storage (Git LFS) is an open-source extension to Git that allows you to work with large text-files. It lets you store files up to 2 GB in size.

Introduction

If you have files in a repository that are bigger than 100 MB, you need to use gif-lfs (lfs - large file size) extension on your client. The server must have support for gif-lfs. Github includes git-lfs support.

The upper file size limit of Git is 100 MB. The upper file size limit of git-lfs is 2 GB. [1], [2]

Client Installation

On Linux-distributions using the deb package system, you get the packages with the following command [3]:
$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
[sudo] password for ... : 
Detected operating system as Ubuntu/xenial.
Checking for curl...
Detected curl...
Checking for gpg...
Detected gpg...
Running apt-get update... done.
Installing apt-transport-https... done.
Installing /etc/apt/sources.list.d/github_git-lfs.list...done.
Importing packagecloud gpg key... done.
Running apt-get update... done.

The repository is setup! You can now install packages.

Install git-lfs with apt:

$ sudo apt-get install git-lfs
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed
  git-lfs
0 to upgrade, 1 to newly install, 0 to remove and 0 not to upgrade.
Need to get 2.815 kB of archives.
After this operation, 11,5 MB of additional disk space will be used.
Get:1 https://packagecloud.io/github/git-lfs/ubuntu xenial/main amd64 git-lfs amd64 2.3.4 [2.815 kB]
Fetched 2.815 kB in 2s (978 kB/s)    
Selecting previously unselected package git-lfs.
(Reading database ... 228238 files and directories currently installed.)
Preparing to unpack .../git-lfs_2.3.4_amd64.deb ...
Unpacking git-lfs (2.3.4) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up git-lfs (2.3.4) ...
Git LFS initialized.

Other package downloaders:

RPM:

$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.rpm.sh | sudo bash

Python:

$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.python.sh | bash

gem:

$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.gem.sh | bash

You can also get the packages from the git-lfs-page on Package Cloud

Finally, verify the installation:

$ git lfs install
Git LFS initialized.

Using git-lfs

The first step is to specify file patterns to store with Git LFS, stored in the file .gitattributes.:

$ mkdir large-file-repo
$ cd large-file-repo
$ git init

Add a filter that causes all zip files to be handled through Git LFS:

$ git lfs track "*.csv"

Now you can push commits:

$ git add .gitattributes
$ git add test.csv
$ git commit -m "add csv files"

Check that Git LFS is managing your zip-file:

$ git lfs ls-files
test.csv

As you can see, git-lfs operations are seamless, which means that you can use git-lfs without changing your existing Git workflow. Note that git clone and git pull operations will be faster as you only download the versions of large files referenced by commits that you check out, not every version of the file.

OBS: The fact that you only have a reference to the files in the local repository, also means that you will only see that reference until you check out the file. Moreover, when you check out the file, the download may take some time due to file size. If you want to have all files complete in the local repository, use git lfs pull.

In Depth

There are several good in-depth guides on how to use git-lfs. The best I have come across is Git LFS by Atlassian. [4]

References

[1] - GitHub Help: Managing Large Files - https://help.github.com/categories/managing-large-files/

[2] - git-lfs ReadMe - https://github.com/git-lfs/git-lfs

[3] - git-lfs-page on Package Cloud - https://packagecloud.io/github/git-lfs

[4] - Git LFS, by Atlassian - https://www.atlassian.com/git/tutorials/git-lfs

About Me

Experienced dev and PM. Data science, DataOps, Python and R. DevOps, Linux, clean code and agile. 10+ years working remotely. Polyglot. Startup experience.
LinkedIn Profile

By Me

Statistics & R - a blog about - you guessed it - statistics and the R programming language.
R-blog

Erlang Explained - a blog on the marvelllous programming language Erlang.
Erlang Explained