Skip to content
Snippets Groups Projects
Commit 9e4d463c authored by Martyn Welch's avatar Martyn Welch Committed by Martyn Welch
Browse files

Remove closing CI loop document


This concept document describes a possible approach to closing the CI loop
using Jenkins. Apertis is moving away from the use of Jenkins and so this
document is now defunct. Remove it.

Signed-off-by: default avatarMartyn Welch <martyn.welch@collabora.com>
parent 25f64346
No related branches found
No related tags found
1 merge request!66Evaluate designs
+++
title = "Closing the Automated Continuous Integration Loop"
short-description = "Close the automated CI loop using the existing infrastructure."
weight = 100
aliases = [
"/old-designs/latest/closing-ci-loop.html",
"/old-designs/v2019/closing-ci-loop.html",
"/old-designs/v2020/closing-ci-loop.html",
"/old-designs/v2021dev3/closing-ci-loop.html",
]
outputs = [ "html", "pdf-in",]
date = "2019-09-27"
+++
# Background
The last phase in the current CI workflow is running the LAVA automated tests, and
there is no mechanism in place to properly process and report these tests results
which basically leave the CI loop incomplete.
# Current Issues
The biggest issues are:
- Tests results need to be checked manually from LAVA logs and dashboard.
- Bugs need to be reported manually for tests issues.
- Weekly test reports need to be created manually.
This point might just be partially addressed by this concept document since a
proper data storage for test cases and tests results has not been defined.
- No mechanism in place to conveniently notify tests issues. Critical issues can
be very easily overlooked.
# Proposal
This document proposes a design around the available infrastructure to implement
a solution to close the CI loop.
The document only covers automated tests, and it leaves manual tests for a later
proposal with a more complete solution.
# Benefits of closing the CI loop
Closing the loop will allow to save time and resources by automating the manual
tasks of checking automated tests results and reporting their issues. It will also
provide the infrastructure foundation for further improvements in tracking the
overall project health.
From a design perspective, it will also help to keep a more complete workflow
in place for the whole infrastructure.
Some of the most important benefits:
- Checking automated tests results will need minimal or no manual intervention.
- Automated tests failures will be reported automatically on time.
- It will provide a more consistent and accurate way to track issues found
by automated tests.
- It will help to keep tests reports up to date.
- It will provide the infrastructure components that will help to implement
further improvements in tracking the overall project health.
Though the project as a whole will benefit from the above points, some benefits
will be more relevant depending on the project roles and areas. Following
subsections give a list of these benefits for each role.
## Developers and Testers Benefits
- It will save time for developers and testers since they won't need to check
automated tests logs and tests results manually in order to report issues.
- Developers will be able to notice and work on critical issues much faster,
since failures will have more visibility on time.
## Managers Benefits
- Given automated tests issues will be reported on time and more consistently,
that will help managers to take more accurate decisions during planning.
## Products Teams Benefits
- The whole CI workflow for automated tests is properly implemented, so it
offers a more complete solution to other teams and projects.
- Closing the CI loop offers a more coherent infrastructure design that other
products teams can adapt to their own needs.
- Products teams will have a better view of the bugs opened in a given time, thus
having a better idea about the overall project health.
# Overview of steps to close the CI loop
This is an overview of the phases required to complete closing the current CI loop:
- Tests results should be fetched from LAVA.
- Tests results should be processed and optionally saved somewhere.
- Tests results should be analyzed.
- Tasks for tests issues should be created using analyzed tests results.
# Current Infrastructure
This section explores the different services available from our infrastructure
proposed to implement the remaining phases to close the CI loop.
## LAVA User Notifications
As of LAVA V2, there is a new feature called **Notication Callback** which allows
to send a GET or POST request to a specified URL to trigger some action remotely.
If using the POST request, this allows to attach and send test job information and
results.
This can be used to send the tests results back to Jenkins from LAVA for further
processing in new pipeline phases.
## Jenkins Webhook Plugin
This plugin provides an easy way to block a build pipeline in Jenkins until an
external system posts to a webhook.
This can be used to wait for the automated tests results sent by LAVA from a new
Jenkins job responsible of triggering the automated tests.
## Phabricator API
Conduit is the developer API for Phabricator which can be used to implement the
management of tasks.
This API can be used (either with tools or language bindings) to manage Phabricator
tasks from a Jenkins phase in the main pipeline.
## Mattermost
Mattermost is the chat system used by the Apertis project.
Jenkins already offers a plugin to send messages to mattermost.
This can be used in order to send notifications messages to the chat channels,
for example, to notify the team once a critical test starts failing, or when a bug
has been updated.
# CI Workflow Overview
The main workflow would basically consist on combining the above mentioned
technologies to implement the different phases for the main CI pipeline.
A general overview of the steps involved would be:
- Jenkins build images and trigger LAVA jobs.
- Use the `webHook` pipeline plugin to wait for LAVA tests results from Jenkins.
- LAVA execute automated tests jobs and results are saved in its database.
- LAVA triggers an user notification callback attaching test job information
and results to send to the Jenkins webHook.
- Tests results are received by Jenkins through the webHook.
- Test information is sent to a new `pipeline` to process and analize tests
results.
- Once tests results are processed and analyzed, these are sent to a new
`pipeline` to manage Phabricator tasks.
- Optionally a new Jenkins phase could notify results to mattermost or via email.
This complete loop will be executed every time new images are built.
# Fetching Tests Results
The initial and most important phase to close the loop is fetching and processing
the automated tests results from LAVA.
The proposed solution in this document is to use the webHook plugin to fetch the
LAVA tests results from Jenkins once the automated test job is finished.
Currently, LAVA tests are submitted in the last stage of the Jenkins pipeline job
creating and publishing the images.
Automated tests are organized in groups, which are submitted all at once using the
`lqa` tool for each image type once the images are published.
A webhook should be registered for each `test job` rather than for a group of
tests, so a change in the way LAVA jobs are submitted is required.
## Jenkins and LAVA Interaction
The proposed solution is to separate the LAVA job submission stage from the main
Jenkins pipeline job building images, and instead have a single Jenkins job that
will take care of triggering the automated tests in LAVA once the images are
published.
The only required fields for the stage submitting the LAVA tests jobs are the
`image_name` , `profile_name`, and `version` of the image. A single Jenkins job
could receive these values as arguments and trigger the automated tests for each
of the respective images.
The way LAVA jobs are submitted from Jenkins will also require some changes. The
`lqa` tool currently submit several `groups` of tests jobs at once, but since each
test job requires to have an unique webhook, they will need to be submitted
independently.
One simple solution is to have `lqa` processing the job templates first and then
submit each processed job file with an unique webhook.
Once all tests jobs are submitted for a specific image type, the Jenkins executor
will wait for all of their webhooks. This will block the executor, but since the
webhook returns immeditealy for those jobs that already posted the results to the
webhook callback, it is fair to say that the executor will only block until the
last completed test job sends its results back to Jenkins.
After all results are received in Jenkins, these can be processed by the remaining
stages required for tasks management.
## Jenkins Jobs
Since images are built from a single Jenkins job, the most sensible approach for
final implementation is to have a new Jenkins job receiving all the images types
information and triggering tests for all of them, then a different job for
processing tests results, and possible another one handling the tasks management
phases.
# Tasks Management
One of the most important phases in closing the loop is reporting tests issues in
Phabricator.
Tests issues will be reported automatically in Phabricator as tasks per test cases
instead of tasks per issues. This has an important consequence explained in the[considerations]( {{< ref "#considerations" >}} ) section.
This section gives an overview for the behaviour of this phase.
## Workflow Overview
Management of Phabricator tasks can be as follow:
1) Query Phabricator to find all open tasks with the tag `test-failure`.
2) Filter the list of received tasks to make sure only the exact tasks are
processed. For this, scanning for further specific fields in the task can be
helpful, for example, keeping only tasks with a specific task name format.
3) Fetch analyzed tests results.
4) For each test, based on its results and checking the tasks list, do the
following:
a) Task exists: Add a coment to the task.
b) Task does not exist:
- If test has status `failed`: Create a new task.
- If test has status `passed`: Do nothing.
## Considerations
- The comment added to the task will contain general information of the failure
with a link to the LAVA job logs.
- Tasks won't be reported per platform but per test case. Once a task for a test
case failure is reported, all platforms failures for such a test case should be
added as comments to that single task.
- Closing and verifying tasks will still require manual intervention. This will
help avoiding the following corner cases:
- Flaky tests that would otherwise end up in a series of new tasks that gets
autoclosed.
- Tests failing on one image that also succeed on a different image.
- If a test starts failing again for a previously closed task, a new task will be
created automatically for it, and manual verification is required to check if
it is the same previously reported issue, in which case is recommended to add
a reference to the old task.
- If after fixing an issue for a reported task, a new issue arises for the
same test case, the same old task will be updated with this new issue. This
is an effect of reporting tasks per test cases instead of per issues. In such a
case, manual verification can be used to confirm if it is or not the same issue
and a new subtask can be manually created by the developer if deemed necessary.
## Phabricator Conventions
For automation of the phabricator tasks management, there will be the need of
creating certain conventions in phabricator. This will require minimal manual
intervention.
First of all a specific user should be created in Phabricator to manage these
tasks automatically.
This username could be named `apertis-qa` or `apertis-tests`, and its only purpose
will be to manage tasks automatically at this stage of the loop.
A special tag and a specific format in the tasks name will also be used in tasks
reported by this special user:
- The tag `test-failure` is the special tag for automated tests failure.
- The task name will have the format: "{testcasename} failed: <Task title>".
- A `{testcasename}` tag can also be used if it is available for the test.
# Design and Implementation
This section gives a brief overview of the design for the main components to close
the loop.
Each of these components can be developed as independent modules, or as a single
unit containing all the logic. The details and final design of these components
depend on the most convenient approach chosen during implementation.
## Design
### Tests Processor
This will take care of processing the tests results as they are received from LAVA.
LAVA tests results are sent to Jenkins in a `raw` format, so things to do
at this level could involve cleaning data or even converting tests results to
a new format so they can be more easily processed by the rest of the tools.
### Tests Analyzer
This will make sure that the test results data is in a consistent and convenient
format to be used by the next module (`task manager`).
This can be a new tool or just be part of the `test result processor` running
in its same Jenkins phase just for convenience.
### Tasks Manager
This will receive the whole tests results data analyzed, and ideally it shouldn't
deal with any test data manipulation.
It will take care of comparing the status between tests results and
phabricator tasks, decide next steps to do and manage those tasks accordingly.
### Notifier
This can be considered an `optional` component and can involve sending further
forms of notifications to different services, for example, send messages to
`Mattermost` channels or emails notifying about new critical bugs.
## Implementation
As originally envisioned, each of the design components could be written using a
scripting language, preferably one that already offers a good integration with
our infrastructure.
The `Python` language is highly recommended, as it already offers plugin for
all the related infrastructure, so it would require a minimal effort to integrate
a solution written in this language.
As a suggested environment, Jenkins could be used like the main place to
execute and orchestrate each of the components. They could be executed using a
different pipeline for each phase, or just a single pipeline executing all the
functionality.
For example, once the LAVA results are fetched in Jenkins, a new pipeline phase
receiving tests results can be started to execute the `test processor` and
`test analyzer`, which in turn will send the output to a new pipeline phase to
execute the `task manager` and later (if available) to the `notifier`.
## Diagram
This is a diagram explaining the different infrastructure processes involved in
the proposed design to close the CI loop.
![](/images/closing_automated_loop.svg)
# Security Improvement
The Jenkins webhook URL will be visible from the public LAVA tests definitions,
which might arise security concerns. For example, another process posting to the
webhook before LAVA does will break the Jenkins job waiting for the tests results.
After researching several options to solve this issue, one solution has been found
which consists in checking for a protected authorization header in Jenkins sent by
LAVA when posting to the webhook.
This solution requires changes both in the Jenkins plugin and the LAVA code, and
they need to be implemented as part of the solution for closing the CI loop.
# Implementation
The final implementation for the solution proposed in this document will mainly
involve developing tools that need to be executed in Jenkins and will interact
with the rest of the existing infra services: LAVA, Phabricator and optionally
Mattermost.
All tools and programms will be available from the project git repositories with
their respective documentation, including how to setup and use them.
In addition to this, the final implementation will also include documentation about
how to integrate, use and maintain this solution using the currently available
infrastructure services, so other teams and projects can also make use of it.
# Constraints or Limitations
- Some errors might not be trivially detected for automated tests, since
LAVA can fail in several ways, for example, infra errors sometimes might be
difficult to analyze and will still require manual intervention.
- The `webHook` plugin blocks the Jenkins pipeline. This might be an issue
in the long term and it should be an open point for further researching
in later version of this document or during implementation.
- This document deals with the existing infra, so a proper data storage has
not been defined for test cases and tests results. Creation of weekly tests
reports will continue requiring manual intervention.
- The test definitions for public LAVA jobs are publicly visible. The Jenkins
webhook URL will also be visible in these tests definitions, which can be a
security concern. A solution for this issue is proposed in
[security improvement]( {{< ref "#security-improvement" >}} ).
- Closing and verifying tasks will still require manual intervention due to
the points explained in the[considerations]( {{< ref "#considerations" >}} ) section.
# New CI Infrastructure and Workflow
The main changes in the new infrastructure is that test results and test cases
will be stored in SQUAD and Git respectively, and there will be mechanisms in
place to visualise test results and send test issues notifications. The new
infrastructure is defined at the [test data storage document][TestDataStorage].
Manual tests are processed by the new infrastructure, so the new workflow will
also cover closing the CI loop for manual tests.
## Components and Workflow
A new web service can be setup to receive the callback triggered by LAVA in a
specific URL in order to fetch the automated tests results, instead of using the
Jenkins webhook plugin. This is in case that using the Jenkins webhook might turn
out not to be a suitable solution during implementation either for the current
CI loop infrastructure or for the new one.
Therefore the following steps will use the term `Tests Processor System` to refer
to the infrastructure in charge of receiving and processing these test results,
and which can be setup either in Jenkins or as a new infrastructure service.
The main components for the new infrastructure can be broadly split into the
following phases: automated tests, manual tests, tasks management, and reporting
and visualization.
### Automated Tests
Workflow for automated tests:
- Jenkins build images and trigger LAVA jobs.
- LAVA execute automated tests jobs and results are saved in its database.
- LAVA triggers an user notification callback attaching test job information
and results to send to the tests processor system.
- The system opens a HTTP URL to wait for the LAVA callback in order to receive
tests results.
- Tests results are received by the tests processor system.
- Once test results are received, these are processed with the tool to convert
the test data into SQUAD format.
- After the data is in the correct format, it is sent to SQUAD using the HTTP
API.
### Manual Tests
Test results will be entered manually by the tester using a new application,
in this workflow named `Test Submitter Application`.
This application will prompt the tester to enter each manual test results, and
will send the data to the SQUAD backend, as explained in the
[test data storage document][TestDataStorage].
The following workflow includes the processing of manual test results into the
CI loop:
- Tester manually executes test cases.
- Tester enters test results into the test submitter application.
- The application sends the test data to the tests processor system using a
reliable network protocol.
- Tests results are received by the tests processor system infrastructure.
- Once test results are received, these are processed with the tools to convert
the test data into SQUAD format.
- After the data is in the correct format, it is sent to SQUAD using the HTTP
API.
### Tasks Management
This phase deals with processing the test results data in order to file and manage
Phabricator tasks and send notifications.
- Once all tests results are stored in the SQUAD backend, they might still need
to be processed by other phases in the tests processor system, and sent to a
new phase to manage Phabricator tasks.
- The new Phabricator phase uses the tests data to file new tasks following the
logic explained in the[tasks management]( {{< ref "#tasks-management" >}} ) section.
- The same phase or a new one could notify results to mattermost or via email.
### Reporting and Visualization
A new web application dashboard will be used to view test results and generate
reports and graphical statistics.
This web application will fetch results from the SQUAD backend and will process
them to generate the relevant statistic and graphics.
The weekly test report will be generated either periodically or at any time as
needed using this web application dashboard.
More details can be found in the [reporting and visualization document][TestDataReporting].
## General Workflow Overview
This section gives an overview about the complete workflow in the following
steps:
- Automated tests and manual tests are executed in different environments.
- Automated tests are executed in LAVA and results are sent to the `HTTP URL`
service open by the `Tests Processor System` to receive the LAVA callback
sending the tests results.
- Manual tests are executed by the tester. The tester uses the `Test Submitter
App` to collect tests results and send them to the `Tests Processor System`
using a reliable network protocol for data transfer.
- All test results are processed and converted to the SQUAD JSON format by the
`Test Processor and Analyzer`.
- Once test results are in the correct format, they are sent to the SQUAD backend
using the SQUAD HTTP API.
- Test results might still need to be processed by the `Test Processor and
Analyzer` in order to be sent to the new phases. Once results are processed,
these are passed to the `Task Manager` and `Notification System` phases to
manage Phabricator tasks and send email or mattermost notifications
respectively.
- From the SQUAD backend, the new `Web Application Dashboard` fetches tests
results periodically or as needed to generate test results views, graphical
statistics, and reports.
The following diagram shows the visual for the above workflow:
![](/images/new_infra_ci_loop.svg)
## New Infrastructure Migration Steps
- Setup a SQUAD instance. This can be done using a Docker image, so the setup
should be very straightforward and convenient to replicate downstream.
- Extend the current `Test Processor System` to submit results to SQUAD. This
basically consists on using the SQUAD URL API to submit the test data.
- Convert the testcases from the wiki format to the strictly defined YAML format.
- Write an application to render the YAML testcases, guide testers through them and
provide them a form to submit their results. This is the `Test Submitter App` and
can be developed as either a web frontend or a command line tool.
- Write the reporting web application which fetches results from SQUAD and renders
reports. This is the `Web App Dashboard` and it will be developed using existing
modules and frameworks in a convenient way such a deployment and maintenance can
be done in the same way than other infrastructure services.
## Maintenance Impact
The new components required for the new infrastructure are the `Test Submitter`,
`Web Application Dashboard` and SQUAD, along with some changes needed for the
`Test Processor System` to receive the manual tests results and send test data
to SQUAD.
SQUAD is an upstream dashboard that can be deployed using Docker, so it can be
conveniently used by other projects and its maintenance effort won't be more than
other infrastructure services.
The test submitter and web application dashboard will be developed reusing existing
modules and frameworks for each of their functionalities, they mainly need to use
already well defined APIs to interact with the rest of the services, and they will
be designed in such a way that can be conveniently deployed (for example, using
Docker). They are not expected to be large applications, so maintenance should be
equal to other tools in the project.
The test processor is a system tool, developed in a modular way, so each component
can reuse existing modules or libraries to implement the required functionality,
for example, make use of an existing HTTP module to access the SQUAD URL API, so
it won't require a big maintenance effort and it will practically be the same than
other infrastructure tools in the project.
Converting the test cases to the new YAML format can be done manually, and a small
tool can be used to assist with the format migration (for example, to sanitize the
format). This should be a one time task, so no further maintenance is involved.
# Links
LAVA Notification Callback :
- https://lava.collabora.co.uk/static/docs/v2/user-notifications.html#notification-callback
Jenkins Webhook Plugin:
- https://wiki.jenkins.io/display/JENKINS/Webhook+Step+Plugin
Phabricator API:
- https://phabricator.apertis.org/conduit
Mattermost Jenkins Plugin:
- https://wiki.jenkins.io/display/JENKINS/Mattermost+Plugin
[TestDataStorage]: test-data-storage.md
[TestDataReporting]: test-data-reporting.md
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment