How to remove a file from Git repo but keep it locally

Updated: January 27, 2024 By: Guest Contributor Post a comment

One of the common scenarios while working with Git is the need to remove a file from the repository but retain the file in your local working directory. Whether it’s for removing a sensitive configuration file, ignoring a log file, or dealing with a large asset that shouldn’t have been committed in the first place, Git provides a straightforward way to achieve this.

The Basics

Before diving into removing files from a Git repository while keeping them locally, it’s important to understand Git’s tracking system. Git tracks the changes of files in your working directory and stages them through the index. Committing those changes will solidify them into the history of your repository.

Check Current Status

To begin, use the git status command to see the current working tree status and the staging area:

$ git status

This command will show you the tracked and untracked files in your repository.

Removal of Files

To remove a file from the repository but keep it in your working directory, the git rm --cached filename command is used:

$ git rm --cached sensitive_data.txt
rm 'sensitive_data.txt'

In the example above, the file sensitive_data.txt is removed from the staging area (index) but is still present in your working directory. The --cached option tells Git to only remove the file from the tracking area, not from the local file system.

Untracking Multiple Files

The same effect can be applied to multiple files or using patterns:

$ git rm --cached *.log

This command will untrack all log files in your current directory, without removing them from your local file system.

Commit the Removal

Once you’ve untracked the files, they will appear as ‘Changes to be committed’ when you run git status. You need to commit the changes to finish the removal process:

$ git commit -m "Remove sensitive data from repository"

This command creates a commit with the removal of the file from the repository.

Ignoring Files Locally

If you don’t want to accidentally add these files back to the repository in the future, you should inform Git to ignore them. Add the filenames to a .gitignore file:

# Add this file content to .gitignore
sensitive_data.txt
*.log

After editing .gitignore, it’s a good practice to track this file as well:

$ git add .gitignore
$ git commit -m "Update .gitignore to exclude specific files"

Advanced File Removal

Sometimes, you might need to remove a file from the entire history of the repository. Be careful with such operations as they can rewrite the history and affect the workflows of collaborators if the commits are public.

Using filter-branch

To remove a file from all commits, use the git filter-branch command:

$ git filter-branch --tree-filter 'rm -f sensitive_data.txt' HEAD

Each commit is checked out, the specified file is removed, and a new commit is made. This rewrites history for all commits in the current branch. Warning: Make sure you coordinate with your team when altering public history.

BFG Repo-Cleaner

A much faster tool than git filter-branch for removing unwanted data is the open-source BFG Repo-Cleaner:

$ java -jar bfg.jar --delete-files sensitive_data.txt

This tool is meant for cleaning up repositories with speed and efficiency in mind. It’s particularly handy for large repositories.

Reflecting the Changes

After using filter-branch or BFG Repo-Cleaner, you’ll have to force push the changes to your remote repository:

$ git push origin master --force

This can be disruptive for other contributors. Make sure it’s absolutely necessary before doing a force push.

When Things Go Wrong

Sometimes you may commit a file that you didn’t mean to remove or ignore. You can restore it if needed:

$ git checkout HEAD^ filename.txt

This command will checkout the file from the previous commit before it was removed.

Conclusion

Managing files in Git requires understanding the stages of file tracking and knowing the right commands to modify this tracking. By using git rm --cached, editing .gitignore, and occasionally rewriting history with advanced tools like filter-branch or BFG Repo-Cleaner, you can dictate exactly what is and isn’t tracked in your repository. Always remember to communicate with your team before making drastic changes like rewriting history.