CI/CD for Safety‑Critical Software: Automating WCET and Timing Checks
safetyci/cdembedded

CI/CD for Safety‑Critical Software: Automating WCET and Timing Checks

wwebscraper
2026-02-05
10 min read
Advertisement

Automate WCET and timing checks in CI: use VectorCAST + RocqStat, detect regressions, and enforce release gating for automotive/aerospace safety software.

Stop releases from breaking real-time guarantees: automate WCET and timing checks in CI/CD

Hook: For automotive and aerospace teams building safety‑critical software, a functional test pass is no longer enough—missing a hidden timing regression can cost millions or lives. You need CI that runs WCET and timing analysis automatically, flags regressions early, and enforces release gating. This guide shows how to do that in 2026 using RocqStat and VectorCAST (now a unified stack after Vector’s acquisition of RocqStat), plus practical CI patterns, code snippets, and governance that pass auditors.

Why timing checks must be part of CI in 2026

Late 2025 and early 2026 brought a decisive shift: vendors and regulators expect integrated timing verification earlier in the lifecycle. In January 2026 Vector Informatik announced the acquisition of StatInf’s RocqStat timing technology and plans to integrate it into VectorCAST, creating a single toolchain for test and timing analysis—making automated timing checks realistic for CI pipelines at scale.1

"Vector will integrate RocqStat into its VectorCAST toolchain to unify timing analysis and software verification" — Automotive World, Jan 16, 2026.

That trend matters to teams shipping to ISO 26262 or DO-178C: timing evidence now needs to be traceable, reproducible, and part of your CI records. Manual or ad‑hoc timing labs won’t scale when you need to validate thousands of commits and dozens of ECU variants.

Executive summary: what a timing‑aware CI pipeline does

  • Produces deterministic builds designed for timing verification (compiler flags, deterministic toolchain).
  • Runs vectorized unit/integration tests (VectorCAST) with instrumentation and collects traces.
  • Executes WCET estimation and measurement workflows (RocqStat / static + measurement) and produces machine‑readable results (JSON/XML).
  • Compares results to baselines and timing budgets, flags regressions with statistical confidence.
  • Enforces release gating: fail PRs / block releases, create traceable exceptions when necessary.

Core principles before you start

  1. Determinism first: Use pinned toolchains, fixed CPU governors, isolated runners or hardware testbeds to reduce noise.
  2. Baseline and traceability: Store baseline WCET and trace artifacts per commit/branch with metadata for audits.
  3. Incremental analysis: Limit heavy WCET runs to changed code paths where possible; use change impact analysis to scale.
  4. Statistical rigor: Use multiple runs + confidence intervals rather than single measurements.
  5. Fail-fast, explain-later: Make the CI provide actionable evidence (stack traces, hot paths, diffs) when blocking a change.

High‑level CI architecture

Below is a compact architecture that fits most teams:

  • Source repo (GitHub/GitLab)
  • CI runners: a mix of cloud‑VMs and dedicated timing lab hardware
  • Build stage: deterministic compiler & instrumentation
  • Test stage: VectorCAST executes test suites, records traces
  • Timing stage: RocqStat runs static WCET + measurement analysis
  • Compare stage: custom comparator evaluates regressions vs baseline
  • Results store: artifact repo + time‑series DB + dashboard (Grafana) — consider modern ingestion patterns like a serverless data mesh for edge microhubs when planning trace ingestion at scale
  • Release gate: policy engine blocks merge/release if checks fail

Example pipeline flow

  1. Commit triggers CI
  2. Build with deterministic flags and instrumentation
  3. Run VectorCAST unit & integration tests (bare‑metal or HIL)
  4. Export execution traces; run RocqStat WCET estimation
  5. Generate results JSON and SARIF for issues
  6. Compare to baseline using configurable rules
  7. Annotate PR and block merge if thresholds exceeded

Practical setup: deterministic builds and runners

Timing checks are only meaningful when your environment is stable. Key setup items:

  • Pin toolchains: Use exact compiler and linker versions; record checksums.
  • Compiler flags: Use -fno-omit-frame-pointer, -fno-inline-functions or documented inline policies, and repeatable link ordering. Build with LTO only if both measurement and static tools support it.
  • CPU isolation: Pin the test process to a dedicated core, disable frequency scaling, disable turbo/boost. On Linux: use cpuset and governor=performance.
  • Thermal control: For HIL, use warmed-up HW or ambient control to avoid thermal throttling noise in measurements.
  • Use hardware timers: Prefer cycle counters (PMU) wired into test harness; validate against wall time.

Automating VectorCAST + RocqStat in CI

VectorCAST provides the test harness and trace exports; RocqStat provides WCET estimation and measurement analysis. In 2026, expect an increasingly integrated workflow as Vector incorporates RocqStat directly—but you can automate today via CLI invocations and artifact exchange.

Sample GitLab CI job (concept)

stages:
  - build
  - test
  - timing

build:
  stage: build
  script:
    - /opt/toolchain/bin/gcc --version
    - make clean all CFLAGS="-O2 -fno-omit-frame-pointer"
  artifacts:
    paths: [build/bin]
    expire_in: 1 week

vectorcast_test:
  stage: test
  script:
    - /opt/vectorcast/bin/vectorcast-run --project project.vcp --output traces/ --no-gui
  artifacts:
    paths: [traces/**]
    expire_in: 2 weeks

wcet_analysis:
  stage: timing
  script:
    - /opt/rocqstat/bin/rocqstat analyze --input traces/ --build build/bin --output timing-results.json
    - python ci/compare_timing.py timing-results.json baselines/${CI_COMMIT_REF_NAME}.json
  artifacts:
    paths: [timing-results.json]
    expire_in: 6 months
  when: on_success

This example is intentionally simple. Production pipelines include retry rules, worker pools (dedicated timing hardware), and an independent verifier job that replays the measurement on a separate runner to catch flaky failures. Think about operational observability and the role SRE plays beyond uptime — see the evolution of site reliability when sizing teams and runbooks.

Comparing results and defining regressions

Regression detection must be objective and auditable. Here are common strategies:

  • Absolute threshold: Fail if WCET > specified budget (e.g., > 2.0 ms).
  • Relative change: Fail if WCET increases more than X% (typical X = 2–10% depending on cost sensitivity).
  • Statistical test: Run n measurements and fail if the 95% upper confidence bound exceeds baseline.
  • Hybrid: Use absolute budget for release gating and relative/statistical checks for PR feedback.

Example comparator (Python pseudocode)

def is_regression(baseline_us, sample_us, min_runs=30, rel_threshold=0.05):
    # baseline_us: float, baseline WCET in microseconds
    # sample_us: list of measurements
    if len(sample_us) < min_runs:
      raise ValueError("Not enough samples")
    mean = statistics.mean(sample_us)
    # 95% CI for the mean
    ci = 1.96 * statistics.pstdev(sample_us) / math.sqrt(len(sample_us))
    upper = mean + ci
    if upper > baseline_us * (1 + rel_threshold):
      return True, {"mean": mean, "upper_ci": upper}
    return False, {"mean": mean, "upper_ci": upper}

This gives you a reproducible, auditable decision. Store the sample list and computed values as CI artifacts.

Scaling timing analysis: incremental and prioritized checks

Full WCET runs are expensive. Use these techniques:

  • Change impact analysis: Map functions to tests; run full timing analysis only for impacted functions or their call graph.
  • Prioritized checks: Run lightweight static WCET fast pass on PRs; run full measurement + RocqStat overnight or in a pre‑merge gate for release branches.
  • Sampling strategy: Use fewer runs for low‑risk changes, increase runs for high‑risk ones.
  • Cache baselines: Only recompute baselines when approved changes merge to main branch.

Integrating results into release gating and governance

Release gating must be transparent and auditable. Implement the following controls:

  • Gate policies: Define automatic fail rules (budget exceedance), warning rules (minor regression allowed pending review), and exception workflows (traceability, justification, approver signature).
  • Evidence packaging: For every gate check, produce an evidence bundle: build artifact, vectorcast traces, rocqstat output, comparator diff, and environment metadata (CPU, firmware, toolchain hashes).
  • Traceability: Link timing checks back to requirements in your requirements management tool (DOORS/Jama) with automated trace IDs.
  • Audit logs: Persist gate decisions in immutable storage (artifact store, signed logs) for regulatory audits — consider approaches from practical security field guides on immutable handling and offsite retention.

Example gate policy

  • Hard fail: WCET > 100% of allocated budget.
  • Soft fail: WCET increase >5% — requires technical review and approver before merge.
  • Info-only: increase >2% — annotate PR with details for reviewers.

Dealing with flaky timing and false positives

Noise can generate false positives. Reduce them with:

  • Run replication: automatically rerun suspicious failing checks on separate hardware and compare.
  • Use cross‑validation: static WCET from RocqStat should corroborate measured worst cases; large divergence indicates measurement noise or tool mismatch.
  • Maintain a whitelist for known non‑deterministic tests with explicit mitigations and timeline for fix.

Dashboards, alerts, and developer feedback

Make timing information actionable at the team level:

  • Push timing metrics into a time‑series DB (InfluxDB / Prometheus) with per‑build tags.
  • Create Grafana dashboards showing trends (WCET, mean, variance) by module and owner — integrate with edge-assisted live collaboration patterns if you need real-time team feedback on regressions.
  • Annotate PRs with a concise summary: baseline vs current WCET, delta, worst functions and stack traces.
  • Integrate CI checks into your Git provider checks API so merges are blocked unless approved.

Tooling & integrations to consider in 2026

With Vector's move to integrate RocqStat into VectorCAST, the tool landscape is converging. Evaluate these integrations:

  • VectorCAST + RocqStat unified toolchain — reduces friction of artifacts and enables single sign‑off for both functional and timing verification.
  • Artifact stores: Nexus/Artifactory for binaries, S3 for large trace artifacts — pair these with serverless Mongo patterns or object storage strategies for cost-effective retention.
  • Traceability tools: Integrate with DOORS/Jama for requirements mapping required by ISO 26262/DO‑178C.
  • CI platforms: GitHub Actions, GitLab CI, Jenkins X — choose one that supports dedicated runners to attach to HIL labs. For edge/host selection and validated hardware, see guidance on pocket edge hosts.

Real‑world checklist: rolling this out in your org

Follow these steps to move from ad‑hoc timing checks to CI‑based enforcement:

  1. Inventory timing‑critical code and map to requirements.
  2. Establish deterministic build recipes and store toolchain images in the artifact registry.
  3. Stand up dedicated timing runners (virtual pinned cores or HIL nodes) and validate environment invariants.
  4. Automate VectorCAST runs and export traces in CI.
  5. Integrate RocqStat (or unified Vector stack) to produce WCET estimates and measurements.
  6. Build comparator and gate policies; automate PR annotations and merge blocking.
  7. Create dashboards and configure alerts; onboard teams with training and runbooks.
  8. Document audit evidence packaging and exception approval workflows.

Benchmarks & expectations

From experience across multiple OEM toolchains in 2024–2026, expect these ballpark numbers:

  • Unit test + VectorCAST baseline run: 2–10 minutes per module (depends on test count).
  • RocqStat quick static estimation: seconds to minutes per function; full analysis for large modules: 10–60 minutes.
  • Measurement runs (30 samples): 30–90 minutes depending on test scope and hardware.

Plan CI stage parallelism and scheduling accordingly—nightly full timing gates and faster PR checks provide a pragmatic SLAs balance.

Safety standards and audit readiness

Timing evidence must satisfy standards:

  • ISO 26262: Timing budgets and evidence for ASIL levels; traceability from requirement to test and timing result.
  • DO‑178C: For avionics, provide deterministic evidence for timing behavior and verification data.

Make the CI artifacts part of your safety case: signed baselines, environment metadata, and reviewer approvals. Keep immutable copies of all CI artifacts for the certification window (typically years) — operational playbooks on edge auditability and decision planes are useful when you design immutable retention and logging.

Common pitfalls and how to avoid them

  • Pitfall: Running timing checks on shared cloud VMs without isolation. Fix: Use pinned cores or dedicated lab hardware.
  • Pitfall: Relying on a single measurement. Fix: Use statistical sampling and confidence intervals.
  • Pitfall: Not storing environment metadata. Fix: Record CPU, firmware, toolchain hashes in artifact manifests.
  • Pitfall: Treating timing as a last‑minute activity. Fix: Shift left: PR checks and daily baselines.

Future direction and predictions (2026+)

Expect these trends in the next 24 months:

  • Tighter toolchain integration: VectorCAST + RocqStat unification will reduce artifact format friction and enable single‑click timing verification jobs.
  • More automation in release gating: Policy engines will become standard components, with regulatory‑grade evidence packaging built into CI.
  • AI‑assisted root cause: Machine learning will be used to correlate code changes and timing regressions and propose fixes for hot paths — balance this with cautionary guidance from Why AI Shouldn’t Own Your Strategy.
  • Cloud HIL offerings: Expect more validated cloud offerings for deterministic HIL as OEMs demand elasticity; see notes on pocket edge hosts for choices when you need low-latency, validated hosts.

Actionable templates & quick wins

Get immediate value with these quick wins:

  1. Add environment metadata collection to every CI job (script that records CPU, kernel, governor, firmware hashes).
  2. Run a nightly job that executes RocqStat on the main branch and stores baselines.
  3. Annotate PRs with a lightweight static WCET check to give developers immediate feedback.
  4. Implement a simple comparator using the Python example above and block merges when the upper CI crosses thresholds.

Conclusion & next steps

In 2026, timing validation is no longer an offline lab chore—it's a CI concern. With Vector's acquisition of RocqStat and the push for integrated verification stacks, you can build CI pipelines that automatically run WCET checks, flag regressions, and enforce release gating with audit evidence. The ROI is faster finds, fewer late surprises, and stronger safety cases.

Call to action: Start by adding deterministic build metadata and a nightly RocqStat baseline job to your CI. If you want a ready‑made reference implementation (GitLab + VectorCAST + RocqStat + comparator scripts + Grafana dashboards), request our CI reference repo and audit checklist. Contact us to accelerate implementation and integrate timing gates into your release process.

References

  • Automotive World. "Vector buys RocqStat to boost software verification." Jan 16, 2026.
Advertisement

Related Topics

#safety#ci/cd#embedded
w

webscraper

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-13T13:10:19.976Z