Skip to content

Frequent 504 Errors repeatedly causing pipelines to be marked as failed

Affected images versions

  • Not relevant - related to CI pipeline

Unaffected images versions

N/A

Testcase

N/A

Steps to reproduce

  • Submit merge request (in my case to the apertis/v2022-updates branch of the pkg/linux repository), triggering CI pipeline
  • Wait for CI pipeline to fail.

Expected result

  • CI pipeline to succeed or fail with actionable failure.

Actual result

Pipeline fails (multiple times) with HTTP status server error (504 Gateway Timeout).

For instance, https://gitlab.apertis.org/pkg/linux/-/jobs/2462923:

> monitor --rev 2 --srcmd5 0b305f2e17b55c7f893b134f46c41c4d --build-log-out build.log --prev-endtime-for-commit 1685065926 --project apertis:v2024dev2:target --package linux --repository default --arch armv7hl
Live build log: https://build.collabora.com/package/live_build_log/apertis:v2024dev2:target/linux/default/armv7hl
Monitoring build, current status is 'building'...
Error running command: 
   0: Request failed: HTTP status server error (504 Gateway Timeout) for url (https://build.collabora.com/source/apertis:v2024dev2:target/linux)
   1: HTTP status server error (504 Gateway Timeout) for url (https://build.collabora.com/source/apertis:v2024dev2:target/linux)
Location:
   /rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/convert/mod.rs:550
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SPANTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
   0: obs_gitlab_runner::retry::retry_request
      at src/retry.rs:63
   1: obs_gitlab_runner::monitor::get_latest_revision
      at src/monitor.rs:82
   2: obs_gitlab_runner::monitor::get_latest_state
      at src/monitor.rs:95
   3: obs_gitlab_runner::monitor::monitor_package with self=ObsMonitor { package: MonitoredPackage { project: "apertis:v2024dev2:target", package: "linux", repository: "default", arch: "armv7hl", rev: "2", srcmd5: "0b305f2e17b55c7f893b134f46c41c4d", prev_endtime_for_commit: Some(1685065926) } } options=PackageMonitoringOptions { sleep_on_building: 10s, sleep_on_dirty: 30s, sleep_on_old_status: 15s, max_old_status_retries: 40 }
      at src/monitor.rs:189
   4: obs_gitlab_runner::handler::run_monitor with args=MonitorAction { project: "apertis:v2024dev2:target", package: "linux", rev: "2", srcmd5: "0b305f2e17b55c7f893b134f46c41c4d", repository: "default", arch: "armv7hl", prev_endtime_for_commit: Some(1685065926), build_log_out: "build.log" }
      at src/handler.rs:416
   5: obs_gitlab_runner::handler::command with cmdline="monitor --rev 2 --srcmd5 0b305f2e17b55c7f893b134f46c41c4d --build-log-out build.log --prev-endtime-for-commit 1685065926 --project apertis:v2024dev2:target --package linux --repository default --arch armv7hl"
      at src/handler.rs:554
   6: gitlab_runner::run::run with gitlab.job=2462923
      at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/gitlab-runner-0.0.5/src/run.rs:137
Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Reproducibility

  1. always
  2. often, but not always
  3. rarely (but frequently enough to be problematic)

Impact of bug

In the vast majority of cases, this probably isn't a major deal to anyone on the core team - those with access can look at OBS and determine that the package has in fact built fine and the CI pipeline just needs to be re-run until it doesn't timeout and gets marked as passed. However... For those that aren't in the core team and can't access OBS, and thus determine whether the package has actually built or not.

Additionally, when working on features for third parties, needing to repeatedly focus on the CI pipeline to get it to pass sucks up time and delays the feature merging (and ultimately getting pushed through to the intended user in a sane way).

Attachments

None

Root cause

There are two aspects here:

  1. OBS should ideally not trigger HTTP 504 Gateway Timeout errors, we need to investigate why it happens
  2. the runner could be more robust and try again (upstream issue https://github.com/collabora/obs-gitlab-runner/issues/11)

Outcomes

TBD

Management data

This section is for management only, it should be the last one in the description.

/cc @andrunko @em @sagar @sudarshan @wlozano

Phabricator link: https://phabricator.apertis.org/T9747

Edited by Emanuele Aina