Overview
There often comes a time in the life of a developer when sensitive data is accidentally pushed to a Git repository. Perhaps it’s a password, an API key, or chunks of confidential information placed within code or files. When this happens, simply deleting the file and pushing a new commit isn’t enough, since the sensitive data remains in the commit history. Hunting down this data over potentially hundreds of commits is a tedious and error-prone process.
This is where the BFG Repo-Cleaner comes in. It’s a simpler, faster alternative to Git’s built-in ‘filter-branch’ command, specifically designed for removing unwanted data. BFG provides a robust way to clean up your Git history, expunging the undesirable data at a faster rate. Throughout this tutorial, I will guide you through the process of using BFG Repo-Cleaner to remove sensitive files from your Git history safely and efficiently.
Prerequisites
- Basic knowledge of Git
- Java Runtime Environment (JRE) version 7 or above, as BFG is a Java program
- A backup of your repository, in case you need to revert changes
Step 1: Installing BFG Repo-Cleaner
To get started, you must install BFG on your system. Official BFG releases are available from the tool’s website. On a system with a direct internet connection, it’s simple:
brew install bfg # MacOS with Homebrew
sudo apt-get install bfg # Debian-based systems
Alternatively, you can download the jar file directly with:
wget https://repo1.maven.org/maven2/com/madgag/bfg/1.13.0/bfg-1.13.0.jar
Once downloaded, you can run BFG using:
java -jar bfg-1.13.0.jar
Step 2: Preparing Your Repository
Before using BFG, ensure your repository’s latest changes are pushed to a remote server and that you’re working on a fresh clone for safety. To clone your repo:
git clone --mirror git://example.com/your-repo.git
This will create a ‘.git’ directory for your cloned project, which contains your repository data. BFG will operate on this directory.
Step 3: Identifying the Sensitive Data
Identify the exact paths of the sensitive files that you want to remove from your Git history.
Step 4: Running BFG to Remove Specific Files
To remove a specific file (e.g., ‘id_rsa’) from your Git history, execute the following:
java -jar bfg-1.13.0.jar --delete-files id_rsa your-repo.git
This command tells BFG to delete the ‘id_rsa’ file from the entire history of the ‘your-repo.git’ Git repository.
Step 5: Removing Passwords or Strings
BFG can also remove strings that match a specific pattern, like passwords or keys. For example, to remove any string that looks like a password:
java -jar bfg-1.13.0.jar --replace-text passwords.txt your-repo.git
You must first place all the patterns you wish to remove in a file named ‘passwords.txt’, where each line contains a string or a regex pattern.
Step 6: Cleaning Up with ‘git reflog’ and ‘gc’
After you’ve run BFG, you should use the following Git commands to clean up the refs and compress your database:
cd your-repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive
This ensures that all the loose objects are removed and your repository size is minimized.
Step 7: Pushing the Changes
Once you’re happy with the local modifications:
git push --force
This will overwrite the remote repository history with your cleaned history. Note that this is a destructive operation and collaborators will need to re-clone the repository.
Step 8: Protecting Against Accidental Pushes in the Future
It’s a good idea to use a ‘.gitignore’ file for preventing sensitive files from being committed. Alternatively, you can use tools like ‘pre-commit’ hooks to scan for sensitive information before each commit.
Advanced Usage: BFG can also be used for more complex history rewrites, such as purging files bigger than a certain size or excluding specific files from the cleaning process. One can also use the BFG to convert all text found in a repo to ASCII encoding, removing files with funky encodings that might be causing problems.
Command for purging files over 10MB:
java -jar bfg-1.13.0.jar --strip-blobs-bigger-than 10M your-repo.git
ASCII text conversion command:
java -jar bfg-1.13.0.jar --to-text-blobs your-repo.git
Conclusion
In conclusion, BFG Repo-Cleaner is a powerful and user-friendly tool for removing unwanted data from your Git history. With careful usage and following best practices for data management and privacy, you can keep your repositories clean and secure. Remember to back up your data before any massive change and ensure your team is on board with the changes made.