Mistakes can happen – we certainly make many. Today we are going to look at how to remove secrets from our repositories and prevent these mistakes from happening again. Before we continue, we want to offer thanks and credit to Ashley Grant, who offered expert advice and collaborated to help work out the BFG piece!
Secret scanners such as GitHub’s GitGuardian and Azure DevOps CredScan are valuable tools to identify secrets in our code. These secrets can include cloud keys, such as Azure/AWS/GCP storage keys, connection strings, or passwords.
A few weeks ago we setup GitGuardian to scan all of our repos in GitHub, including all commits and pull requests. Surprise! – it unexpectedly found some secrets in our Feature Flags repo…
We have two scenarios where secrets can appear in commits. The first, is secrets in commits on the main branch. The second is secrets committed in a branch and pushed via a Pull Request, where we want to remove the commit before we complete the pull request.
Preventing secrets with Pull Requests
Before we look at how to deal with pull request commit secrets, first, let’s look at prevention – the best long term solution is to prevent the secrets from being added. This can be achieved with the combination of a few strategies:
- Using branch policies to ensure we can’t accidentally merge secrets into the main branch. By ensuring that GitGuardian or CredScan is setup as a merge policy, accidental secrets will only be on feature branches – limiting exposure.
- Merge commits into one commit when the pull request is completed, helping to hide our working. Note that this working is still visible inside the pull request – we will see this later.
- Delete the branch when the pull request is done, cleaning up our workspace and working
Of course, if we do accidentally push a secret to a Pull Request in a public repo, we always should assume the secret is compromised and recycle the secret. It’s interesting there is no UI for this yet, but after intensive research into options, we believe this is the easiest path to solve the problem. .
- First, we identify the commits we need to remove in the target branch, recording the commit “SHA” ids for later.
- We switch to the target branch:
git checkout SecretsCleanUp
- Now we open the commit list into a text file, where the number (5) is the number of commits to include. If our secret is further in the past (e.g. 10 commits), we would want to include a higher number to capture those. This rebase starts a sort of transaction, if you need to abort, skip to step 6.
git rebase -i HEAD~5
- In the text file, we find the commits we want to remove, delete those lines, and then save and close the file.
- Now we need to push the fix. The “-f” is a force – you may need to be an administrator to complete the force command:
git push -f
- If we need to abort the rebase process, run this code:
git rebase --abort
You’ll notice the commit has been removed. You will now be able to complete and merge your pull request.
Note: Rebase is a dangerous command, use with care, or you can delete some of your code. We recommend you practice this with a separate test repo before fixing your production code.
Cleaning historical commits
Cleaning up historical commits are not straight forward. How do we easily clean up historic commits? What if we find a secret in the distant history of our main branch? Let’s explore.
This is significantly more difficult because of the way Git stores history and the commits are linked together. As with the previous example, if we haven’t already, it’s essential we recycle these exposed secrets and ensure that it’s unable to be used! Here is an example where an access key is stored in some source code (don’t worry, this access key is no longer active and has been retired)
Enter the BFG, a repo cleaner. The repo cleaner allows you to remove a commit, and then rebuild the history. There are some side effects, as every commit sha id changes. Therefore, it’s recommended to minimize, if not completely eliminate, branches and forks before running the BFG. If there are any branches or forks out there, they will still contain the secret, and could potentially merge the secret back into the main branch. To use the BFG:
- We download the BFG into our downloads folder. We need to have Java installed.
- Next we need to create a passwords.txt text file. This is where we will place the password we need to search for. We paste in our access key:
3. Next we run the BFG command in a command line window:
java -jar c:\users\[user]\downloads\bfg-1.13.0.jar --replace-text c:\users\[user]\downloads\passwords.txt
4. Finally we run a “git push –force” to push the changes back up to the repository. If you have branch policies, (you should), we recommend editing the branch policy temporarily, to enable the checkbox “allow force pushes”.
5. Looking at the code again, we can see the secret has been replaced by the text ***REMOVED***. Our secret has been completely purged.
6. As a double check, we search our repository. Interestingly, we get a hit on the secret – in the original commit, inside a pull request – GitHub has cached this data. There are two ways to resolve this. We can contact GitHub Support or GitHub Premium Support and request that the cached data be removed. Alternatively, we can just wait ~24 hours for the cache to expire.
We’ve cleaned up all of our secrets, and GitGuardian is showing that all of our repos are now clean!
- GitGuardian: https://github.com/GitGuardian
- Removing sensitive data from a repository: https://help.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository#purging-a-file-from-your-repositorys-history
- BFG repo cleaner: https://rtyley.github.io/bfg-repo-cleaner/
- Featured image credit: https://tr2.cbsistatic.com/hub/i/r/2019/12/10/2637b074-06fb-4ee2-8aea-097020a9aa14/thumbnail/768×432/280d5465e3c471a2e8de64e2126a650f/istock-1061357610.jpg