Learning how to design a GitHub Actions workflow to test for changes in a Python package.
Working in public health, I’ve learned that maintaining bioinformatics tools is a bit like being a gardener – you need to constantly tend to your tools while ensuring they continue to work reliably. As I make updates to improve these tools, one of the challenges has been making sure that newer versions produce consistent results. After all, when these tools are part of critical public health pipelines, unexpected changes in output could have significant downstream effects.
Like many developers, I used to spend countless hours manually testing for changes between versions – a process that was not only time-consuming but prone to human error. That was until I discovered GitHub Actions, a game-changing automation tool that transformed my development workflow. Think of it as having a dedicated assistant who tirelessly checks your work, making sure everything runs smoothly across different versions.
GitHub Actions has become my reliable partner in ensuring code quality and consistency. It’s a powerful platform that lets you define, manage, and automatically execute tasks directly within your repository. The best part? It catches potential issues before they can impact production environments, saving valuable time and reducing headaches.
In this blog post, I’ll walk you through how to harness the power of GitHub Actions to create custom workflows that automatically test for changes that you make on your projects. We’ll use a real-world example of a bioinformatics tool called hicap, which we use for serotyping Haemophilus influenzae. Whether you’re working in bioinformatics or another field, you’ll learn how to implement these automated testing workflows to make your development process more efficient and reliable.
GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that automates your software development workflows right in your GitHub repository. Think of it as your personal robot assistant that can:
Before diving in, let’s understand some basic terminology:
Workflow: A configurable automated process made up of one or more jobs. It’s defined in a YAML file in your repository’s .github/workflows
directory.
Job: A set of steps that execute on the same runner (virtual machine).
Step: An individual task that can run commands or actions.
Let’s break down how to create a testing workflow using hicap
as our example.
Create a new file in your repository at .github/workflows/test.yml
. The basic structure looks like this:
name: Hicap Test Workflow
on:
workflow_dispatch:
release:
types: [created, published]
This header section defines:
jobs:
test-and-compare:
runs-on: ubuntu-latest
Here we:
Github provides various virtual machines for running workflows, such as windows-latest
, macos-latest
, and ubuntu-latest
or with specific versions ubuntu-20.04
, windows-2019
, etc.
Let’s look at the key steps for environment setup:
steps:
- name: Checkout current code
uses: actions/checkout@v4
- name: Set up Conda with Mamba
uses: conda-incubator/setup-miniconda@v3
with:
miniforge-variant: Miniforge3
miniforge-version: latest
use-mamba: true
activate-environment: test
These steps:
uses
keyword. For example: steps:
- name: Use a Marketplace action
uses: actions/checkout@v4
steps:
- name: Use an action from another repository
uses: owner/repo@v1
- name: Create Conda environment from YAML
shell: bash -el {0}
run: |
mamba env create -f environment.yml
This step creates a Conda environment using your environment.yml
file, which should list all your package dependencies.
💡Pro Tip: I export the environment file from my development environment using conda env export --no-builds > environment.yml
. The --no-builds
flag ensures that the environment is platform-independent by excluding build-specific information.
In our example, we use two-phase testing strategy, one with the current release and another with the new changes made in the repository.
Phase 1: Test Current Release
- name: Run hicap test (before update)
shell: bash -el {0}
run: |
source $CONDA/bin/activate hicap-env
mkdir -p test_old
hicap -q tests/data/type_a.fasta -o test_old
Sometimes we have to run source $CONDA/bin/activate hicap-env
to activate the environment as it bypasses the need for conda initialisation in the shell session. The shell
attribute is used to specify the shell to run the commands in. The -e
flag causes the shell to exit immediately if any command exits with a non-zero status and -l
flag ensures that the shell is a login shell. {0}
is a placeholder for the scripts and commands to be run.
Phase 2: Test New Changes
- name: Install editable hicap package
shell: bash -el {0}
run: |
source $CONDA/bin/activate hicap-env
pip install -e .
- name: Run hicap test (after update)
shell: bash -el {0}
run: |
conda activate hicap-env
mkdir -p test_new
python3 hicap-runner.py -q tests/data/type_a.fasta -o test_new
This approach allows us to:
pip install -e .
installs the package in editable mode, allowing changes to the source code to be reflected immediately without needing to reinstall the package.
- name: Compare test results
id: compare
shell: bash -el {0}
run: |
if ! diff -r test_old test_new > diff_output.txt; then
echo "DIFF_DETECTED=true" >> $GITHUB_ENV
else
echo "DIFF_DETECTED=false" >> $GITHUB_ENV
fi
- name: Print hicap test output
if: env.DIFF_DETECTED == 'true'
shell: bash -el {0}
run: |
echo "Differences found between test_old and test_new:"
cat diff_output.txt
This section:
Here, diff -r
recursively compares the contents of the two directories and writes the output to diff_output.txt
. The if
condition checks if differences were detected and if so, it sets an environment variable DIFF_DETECTED
to true
by appending to the GITHUB_ENV
and vice versa. We can access custom environment variables in a workflow run using the env
attribute as done in env.DIFF_DETECTED
. The purpose of id
field is to give a unique identifier to the step which can be used to refer to the step in the workflow. For example, in this workflow with id: compare
, we can refer to this step as steps.compare.outputs.<variable_name>
.
In this example, we’ve shown how to set up a testing workflow for a Python package using GitHub Actions using the hicap
tool as an example. You can adapt this workflow to test your own Python packages by following the same principles. You can view the results of the workflow in the Actions tab of the repository. For an example of the workflow run where I deliberately introduced a change with the output that would print hicap_version
as one of the columns, you can view the workflow run where you can open the Print hicap test output
step to see the differences detected. The differences are in both lines of the output where first line has extra column hicap_version
and the second line has the version number of the tool. Also the order of IS1016_hits
i.e. bexA, bexB, bexD, bexC
is different in the two outputs.
This is just a tip of the iceberg of what you can do with GitHub Actions. I am sure there are better ways to design the workflow and the action to make it more efficient and effective, even just for testing changes in the tool. This blog is written to make ourselves familiar with the concepts of GitHub Actions and how to design a workflow around it.
In summary, GitHub Actions provides a powerful way to automate your Python package testing. By following this guide and examining the hicap
example, you now have the foundation to create your own testing workflows. Remember to start simple and gradually add complexity as needed.
actions/checkout@v4
)set -euo pipefail
in bash scriptsshell: bash -el {0}
when working with Conda environments