Intro
In the CI/CD space, every second counts like in an F1 race. That's where GitHub Actions cache comes in. Caching is like a pit stop for your code – it saves precious time by providing pre-loaded resources to speed up pipeline execution by storing and reusing previously downloaded dependencies.
However, cache use especially in public repositories might be risky and prone to malicious intrusion. Therefore you need to shield yourself against any potential vulnerabilities and cache attacks. In this blog post, we’ll optimize GitHub cache across different workflow jobs, but cleanup things as soon as we’re done.
Caching vulnerabilities in public Repos
When it comes to caching in a public repository (with secrets), running workflows can become a dangerous practice. Here's why:
Exposure to Unauthorized Access Anyone with read access can create a pull request and access the sensitive data within a public repo cache.
Forks of a repository can also create PRs on the base branch and access caches on the base branch.
Data Exfiltration Hackers can exploit the cached secrets to gain unauthorized access to your systems
No Encryption Caches are not encrypted by default, making stored secrets easily readable if discovered
Inadvertent Exposure Devs might accidentally push sensitive data to a public repo without realizing it.
Cache poisoning A malicious tool used in a test workflow can poison its cache. Later, another workflow using the same cache might be affected, read more in this github-cache-poisoning article.
It is even worse for artifacts as there’s literally a download button accessible to anyone in the internet.
Remediation
There are essential security best practices to minimize this risk, but today, I'll focus on only one from the list below.
Secret Management Tools: Use secret management tools provided by Vault, AWS Secrets Manager etc.
Private Repositories: Not always possible(OSS projects), but helps limit access to authorized users only.
Encryption: ensure strong encryption of the cached data
Don't store any sensitive information in the cache
Temporary Caching: If you need to use caching for performance optimization, ensure that it's temporary and short-lived.
Demo: Instant Cache cleanup
As mentioned earlier, one solution for minimizing the attack surface involves regularly clearing the cache to prevent long-term exposure. The following example will demonstrate exactly that using cache action and GitHub CLI.
Cache Retention in GitHub
GitHub Cache default retention is 7 days for caches that have not been accessed.
There is no cache number limit, but the total size of all caches in a repository is limited to 10 GB.
The artifacts & workflow log files on the other hand are usually retained for 90 days before auto deletion.
Cache action
We’ll be using a cache action called actions/cache@v3 that has 3 main parameters:
path: A list of files, directories, and wildcard patterns to cache and restore
key: An explicit key for a cache entry
restore-key: A list of prefix-matched keys to use for restoring stale cache if no cache hit occurred for key.
PREREQUISITES
A repository
Example Repo: brokedba/githubactions_hacks Branch: git_actions
You can clone my repo and reload it into your GitHub but remember to add the environment & branch.
An environment
Name: lab_tests , with deployment branch set to `selected branch`: i.e git_actions
A workflow
The common workflow and jobs declaration
Trigger
event: push Target branch: git_actions
paths: our yaml workflow test_cache_cleanup.yml
# “test_cache_cleanup.yml”
name: 'My_Cache_cleanup_Workflow'
on:
push: <------ Trigger
branches: [ "git_actions" ]
paths:
- '.github/workflows/test_cache_cleanup.yml' <--- File
jobs:
terraform_setup_cache_load:
runs-on: ubuntu-latest
environment: test-labs <--- Environment linked to git_action branch
snipet ...
Initial steps: checkout the repo and install a specific version of terraform (1.0.3)
steps:
# 1. Checkout the repository to the GitHub Actions runner
- name: Checkout
uses: actions/checkout@v3
# 2. Install the latest version of Terraform CLI
- name: Setup Terraform
uses: hashicorp/setup-terraform@v1
with:
terraform_version: 1.0.3
terraform_wrapper: false
Prepare the dependencies directory (for the terraform provider files)
Note: My repo has 0 terraform config file, but we’ll assume we ran terraform init (see section #4.)
# 3.
Create a cache for the terraform plugin and copy terraform binary
- name: Config Terraform plugin cache
run: |
echo 'plugin_cache_dir="$HOME/.terraform.d/plugin-cache"' >~/.terraformrc
mkdir --parents ~/.terraform.d/plugin-cache
terraform -v
terra_bin=`which terraform`
cp $terra_bin . <------ copy terraform binary to local directory
# 4. Perform remaining steps ...example terraform init before caching.
# - name: terraform init
# run: |
# Initialize the Terraform directory(creating initial files, load modules etc.)
# example: terraform init ...
snipet ...
Cache all dependencies (terraform 1.0.3 binary + provider plugin)
Cache key includes github.run_id which is a unique ID for our workflow run
Restore-keys will use the same pattern i.e: “Linux-terraform-5868971041”
# ###################################
# Save directory files into our cache
# ###################################
# Save all plugin files and working Directory in a cache
- name: Cache Terraform
uses: actions/cache@v3
with:
path: |
~/.terraform.d/plugin-cache
./*
key: ${{ runner.os }}-terraform-${{ github.run_id }} <---- Our unique Cache Key
restore-keys: |
key: ${{ runner.os }}-terraform-${{ github.run_id }}
Now time to restore the cache in another job (runner), avoiding repo checkout, terraform install and initialization.
# ###################################
# JOB 2 : terraform Plan
# ###################################
Terraform_Plan:
name: 'Terraform Plan'