Frequent 504 Errors repeatedly causing pipelines to be marked as failed
Affected images versions
- Not relevant - related to CI pipeline
Unaffected images versions
N/A
Testcase
N/A
Steps to reproduce
- Submit merge request (in my case to the
apertis/v2022-updates
branch of thepkg/linux
repository), triggering CI pipeline - Wait for CI pipeline to fail.
Expected result
- CI pipeline to succeed or fail with actionable failure.
Actual result
Pipeline fails (multiple times) with HTTP status server error (504 Gateway Timeout)
.
For instance, https://gitlab.apertis.org/pkg/linux/-/jobs/2462923:
> monitor --rev 2 --srcmd5 0b305f2e17b55c7f893b134f46c41c4d --build-log-out build.log --prev-endtime-for-commit 1685065926 --project apertis:v2024dev2:target --package linux --repository default --arch armv7hl
Live build log: https://build.collabora.com/package/live_build_log/apertis:v2024dev2:target/linux/default/armv7hl
Monitoring build, current status is 'building'...
Error running command:
0: Request failed: HTTP status server error (504 Gateway Timeout) for url (https://build.collabora.com/source/apertis:v2024dev2:target/linux)
1: HTTP status server error (504 Gateway Timeout) for url (https://build.collabora.com/source/apertis:v2024dev2:target/linux)
Location:
/rustc/a55dd71d5fb0ec5a6a3a9e8c27b2127ba491ce52/library/core/src/convert/mod.rs:550
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SPANTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
0: obs_gitlab_runner::retry::retry_request
at src/retry.rs:63
1: obs_gitlab_runner::monitor::get_latest_revision
at src/monitor.rs:82
2: obs_gitlab_runner::monitor::get_latest_state
at src/monitor.rs:95
3: obs_gitlab_runner::monitor::monitor_package with self=ObsMonitor { package: MonitoredPackage { project: "apertis:v2024dev2:target", package: "linux", repository: "default", arch: "armv7hl", rev: "2", srcmd5: "0b305f2e17b55c7f893b134f46c41c4d", prev_endtime_for_commit: Some(1685065926) } } options=PackageMonitoringOptions { sleep_on_building: 10s, sleep_on_dirty: 30s, sleep_on_old_status: 15s, max_old_status_retries: 40 }
at src/monitor.rs:189
4: obs_gitlab_runner::handler::run_monitor with args=MonitorAction { project: "apertis:v2024dev2:target", package: "linux", rev: "2", srcmd5: "0b305f2e17b55c7f893b134f46c41c4d", repository: "default", arch: "armv7hl", prev_endtime_for_commit: Some(1685065926), build_log_out: "build.log" }
at src/handler.rs:416
5: obs_gitlab_runner::handler::command with cmdline="monitor --rev 2 --srcmd5 0b305f2e17b55c7f893b134f46c41c4d --build-log-out build.log --prev-endtime-for-commit 1685065926 --project apertis:v2024dev2:target --package linux --repository default --arch armv7hl"
at src/handler.rs:554
6: gitlab_runner::run::run with gitlab.job=2462923
at /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/gitlab-runner-0.0.5/src/run.rs:137
Backtrace omitted.
Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Reproducibility
- always
- often, but not always
-
✅ rarely (but frequently enough to be problematic)
Impact of bug
In the vast majority of cases, this probably isn't a major deal to anyone on the core team - those with access can look at OBS and determine that the package has in fact built fine and the CI pipeline just needs to be re-run until it doesn't timeout and gets marked as passed. However... For those that aren't in the core team and can't access OBS, and thus determine whether the package has actually built or not.
Additionally, when working on features for third parties, needing to repeatedly focus on the CI pipeline to get it to pass sucks up time and delays the feature merging (and ultimately getting pushed through to the intended user in a sane way).
Attachments
None
Root cause
There are two aspects here:
- OBS should ideally not trigger HTTP 504 Gateway Timeout errors, we need to investigate why it happens
- the runner could be more robust and try again (upstream issue https://github.com/collabora/obs-gitlab-runner/issues/11)
Outcomes
TBD
Management data
This section is for management only, it should be the last one in the description.
/cc @andrunko @em @sagar @sudarshan @wlozano
Phabricator link: https://phabricator.apertis.org/T9747