Deleting sensitive files from Git history with BFG Repo-Cleaner

Updated: January 27, 2024 By: Guest Contributor Post a comment

Overview

There often comes a time in the life of a developer when sensitive data is accidentally pushed to a Git repository. Perhaps it’s a password, an API key, or chunks of confidential information placed within code or files. When this happens, simply deleting the file and pushing a new commit isn’t enough, since the sensitive data remains in the commit history. Hunting down this data over potentially hundreds of commits is a tedious and error-prone process.

This is where the BFG Repo-Cleaner comes in. It’s a simpler, faster alternative to Git’s built-in ‘filter-branch’ command, specifically designed for removing unwanted data. BFG provides a robust way to clean up your Git history, expunging the undesirable data at a faster rate. Throughout this tutorial, I will guide you through the process of using BFG Repo-Cleaner to remove sensitive files from your Git history safely and efficiently.

Prerequisites

  • Basic knowledge of Git
  • Java Runtime Environment (JRE) version 7 or above, as BFG is a Java program
  • A backup of your repository, in case you need to revert changes

Step 1: Installing BFG Repo-Cleaner

To get started, you must install BFG on your system. Official BFG releases are available from the tool’s website. On a system with a direct internet connection, it’s simple:

brew install bfg  # MacOS with Homebrew
sudo apt-get install bfg  # Debian-based systems

Alternatively, you can download the jar file directly with:

wget https://repo1.maven.org/maven2/com/madgag/bfg/1.13.0/bfg-1.13.0.jar

Once downloaded, you can run BFG using:

java -jar bfg-1.13.0.jar

Step 2: Preparing Your Repository

Before using BFG, ensure your repository’s latest changes are pushed to a remote server and that you’re working on a fresh clone for safety. To clone your repo:

git clone --mirror git://example.com/your-repo.git

This will create a ‘.git’ directory for your cloned project, which contains your repository data. BFG will operate on this directory.

Step 3: Identifying the Sensitive Data

Identify the exact paths of the sensitive files that you want to remove from your Git history.

Step 4: Running BFG to Remove Specific Files

To remove a specific file (e.g., ‘id_rsa’) from your Git history, execute the following:

java -jar bfg-1.13.0.jar --delete-files id_rsa your-repo.git

This command tells BFG to delete the ‘id_rsa’ file from the entire history of the ‘your-repo.git’ Git repository.

Step 5: Removing Passwords or Strings

BFG can also remove strings that match a specific pattern, like passwords or keys. For example, to remove any string that looks like a password:

java -jar bfg-1.13.0.jar --replace-text passwords.txt your-repo.git

You must first place all the patterns you wish to remove in a file named ‘passwords.txt’, where each line contains a string or a regex pattern.

Step 6: Cleaning Up with ‘git reflog’ and ‘gc’

After you’ve run BFG, you should use the following Git commands to clean up the refs and compress your database:

cd your-repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive

This ensures that all the loose objects are removed and your repository size is minimized.

Step 7: Pushing the Changes

Once you’re happy with the local modifications:

git push --force

This will overwrite the remote repository history with your cleaned history. Note that this is a destructive operation and collaborators will need to re-clone the repository.

Step 8: Protecting Against Accidental Pushes in the Future

It’s a good idea to use a ‘.gitignore’ file for preventing sensitive files from being committed. Alternatively, you can use tools like ‘pre-commit’ hooks to scan for sensitive information before each commit.

Advanced Usage: BFG can also be used for more complex history rewrites, such as purging files bigger than a certain size or excluding specific files from the cleaning process. One can also use the BFG to convert all text found in a repo to ASCII encoding, removing files with funky encodings that might be causing problems.

Command for purging files over 10MB:

java -jar bfg-1.13.0.jar --strip-blobs-bigger-than 10M your-repo.git

ASCII text conversion command:

java -jar bfg-1.13.0.jar --to-text-blobs your-repo.git

Conclusion

In conclusion, BFG Repo-Cleaner is a powerful and user-friendly tool for removing unwanted data from your Git history. With careful usage and following best practices for data management and privacy, you can keep your repositories clean and secure. Remember to back up your data before any massive change and ensure your team is on board with the changes made.