P2 case_file

Production CI/CD Failure in a Reusable GitHub Actions Workflow

CI/CD Production

Context

A production deployment pipeline used a reusable GitHub Actions workflow to standardise releases. Releases had been working, then began failing before any job ran.

Problem

The caller workflow passed an input that did not exist in the referenced reusable workflow's input contract. GitHub aborted the run during input resolution, surfacing a generic failure that did not point cleanly at the cause.

My role

Owner of the investigation: traced the failing run, the workflow contract, and the deployment plan that depended on it.

Technical actions

[01] Reviewed workflow_dispatch inputs and the reusable workflow's declared inputs.
[02] Compared caller and callee YAML to isolate the contract drift.
[03] Validated branch and tag references used by the deployment plan.
[04] Re-aligned the caller to the reusable workflow's input contract.
[05] Clarified the branch/tag expectations so future releases would not silently drift again.

Operational impact

Deployment workflow restored. Failure mode documented so the next contract drift is diagnosed in minutes instead of trial-and-error against the pipeline.

What this demonstrates

Reading CI/CD failures as contract problems, not as 'flaky pipelines'.
Working comfortably across YAML, reusable workflow semantics, and deployment configuration.
Improving diagnosability after fixing the immediate failure.

Why this matters

Most CI/CD outages are not bugs in the runner. They are contracts between two YAML files quietly drifting apart. Treating them that way changes the speed of recovery.