Commits · 17ca61a5323757a7093d6ca7ff4a223a5bd98e4b · infrastructure / Apertis infrastructure dashboard

May 09, 2022

fetch-downstream: Log when the cached data is used · 17ca61a5

Emanuele Aina authored 2 years ago and

Vignesh Raman committed 2 years ago


Ensure a log message is printed when loading the cache to give a clearer
context when debugging issues.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

17ca61a5

Mar 31, 2022

fetch-downstream: include additional licensing information · 7c144fef

Arnaud Ferraris authored 3 years ago and

Detlev Casanova committed 3 years ago

In order to facilitate license compliance, the dashboard should report
the most severe license issues found in packages from the "target"
repository: this includes cases where we override the existing license
information (especially when specifying a global default license) or
packages for which whitelisting is enabled for the whole source tree.

As part of this effort, this commit fetches the copyright override and
whitelisting files, and process those in order to detect major issues.
The corresponding new per-branch flags are then set accordingly and
added to the output file.

Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>

7c144fef

fetch-downstream: rework filtering method for clarity and completeness · 59e5999d

Arnaud Ferraris authored 3 years ago and

Detlev Casanova committed 3 years ago


The `filter_cache()` method used an optional `check_tags` argument to
decide between several code paths. This does not properly reflect the
purpose of this argument, which is to differentiate between 2 use cases:
- check and populate information about the descendants of a given ref
- check and populate component and license information

As additional fields will have to be taken into account in the future,
this rework aims at making the aforementioned method more flexible,
while making it clear to which use-case belongs each code path. To that
effect, the `check_flags` argument is renamed to `purpose` and now uses
an enum to distinguish between use-cases. It also only processes the
relevant data for the current use-case:
- when checking for descendants, only descendants data is looked up and
  updated
- when checking for license info, only this information is checked and
  updated

Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>

59e5999d

Mar 17, 2022

fetch-downstream: implement partial caching · f668fdee

Arnaud Ferraris authored 3 years ago


This avoids making unnecessary requests when branches and tags didn't
change since the last run.

Signed-off-by: Arnaud Ferraris <arnaud.ferraris@collabora.com>

f668fdee

Feb 11, 2022

fetch-downstream: Make license report retrieval logs less verbose · e657f1b3

Emanuele Aina authored 3 years ago


Reduce the severity of the logs about retrieved licensing logs since
they are currently drowning everything else.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

e657f1b3

Feb 08, 2022

fetch-downstream: Fix branch version detection · 19fe3eda

Emanuele Aina authored 3 years ago


Retrieve all descendants to correctly detect when a branch got merged in
another and which tags apply.

Unfortunately `commit.refs()` does not currently handle pagination, so
only the first 20 items were downloaded.

This meant that in some cases no tags were returned, so no version could
be computed. In other cases this may have cause incorrect reports of
branches not being merged into their downstream, or other issues due to
computing the wrong version for the branch.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

19fe3eda

Jan 31, 2022

data-fetch-downstream: Don't ignore errors retrieving component · 33f827bd

Emanuele Aina authored 3 years ago and

Vignesh Raman committed 3 years ago

When retrieving the files under `debian/apertis` only expect the
404 Not Found error and re-raise everything else: we want to cleanly
handle missing files, but timeouts (for instance) are actual errors.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

33f827bd

Dec 31, 2021

Limit tags for each branch to the ones with the same prefix · 05cae492

Emanuele Aina authored 3 years ago


Listing which e.g. `debian/*` tag is contained in a `apertis/*` branch
is rather pointless and is a significant waste of resources.

Limit the reported tags to the one with a matching prefix, so only
`debian/*` tags are listed on `debian/*` branches and only `apertis/*`
tags are listed on `apertis/*` branches.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

05cae492

Do not list the ref itself as its own descendant · 0139d34a

Emanuele Aina authored 3 years ago


Let's save a few bytes by getting rid of some pointless information.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

0139d34a

Dec 21, 2021

fetch-downstream: Increase parallelism retrieving component · 7059e89b

Emanuele Aina authored 3 years ago


Add more parallel workers whe retrieving the component and licensing
report from git to speed up the process.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

7059e89b

fetch-downstream: Fetch component for v2021 as well · 33d698ec
Emanuele Aina authored 3 years ago
```
Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>
```
33d698ec

fetch-downstream: Drop check_duplicates() · dd3e81a9

Emanuele Aina authored 3 years ago


Since we moved all git repositories directly under a single `pkg/` group
instead of the `pkg/$component/` nested categories, checking for
duplicates is no longer relevant.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

dd3e81a9

Sep 17, 2021

Add check for missing license report · 12c4e595

Walter Lozano authored 3 years ago

Apertis should follow a strict license compliance to avoid shipping software
under not suitable license. Improve dashboard to warn about packages in
target repository which does not provide license report.

Signed-off-by: Walter Lozano <walter.lozano@collabora.com>

12c4e595

Feb 26, 2021

packaging-data-fetch-downstream: Workaround python-gitlab escaping bug · 5e651ed8

Emanuele Aina authored 4 years ago

Git refnames are relatively free-form and can contain all sort for
special characters, not just `/ and `#`, see
http://git-scm.com/docs/git-check-ref-format

In particular, Debian's DEP-14 standard for storing packaging in git
repositories mandates the use of the `%` character in tags in some
cases like `debian/2%2.6-21`.

Unfortunately python-gitlab currently only escapes `/` to `%2F` and in
some cases `#` to `%23`. This means that when using the commit API to
retrieve information about the `debian/2%2.6-21` tag only the slash is
escaped before being inserted in the URL path and the `%` is left
untouched, resulting in something like
`/api/v4/projects/123/repository/commits/debian%2F2%2.6-21`. When
urllib3 seees that it detects the invalid `%` escape and then urlencodes
the whole string, resulting in
`/api/v4/projects/123/repository/commits/debian%252F2%252.6-21`, where
the original `/` got escaped twice and produced `%252F`.

This works around the issue while waiting for the upstream fix,
see https://github.com/python-gitlab/python-gitlab/pull/1336



Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

5e651ed8

packaging-data-fetch-downstream: Fix latest version on branch · db27eb7f

Emanuele Aina authored 4 years ago


When the latest commit on a branch is not tagged (that is, there are
unreleased commits) picking the first descendant tag is not correct
since it can end up picking tags from descendant branches.

For instance, if `apertis/v2022dev0` has unreleased commits and they get
released in `apertis/v2022dev1` the current code gets confused and
consider them to have the same version.

To avoid that, check which tags are actually contained in each branch
and pick the latest.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

db27eb7f

Feb 24, 2021

Retrieve the status of the latest pipeline on each branch · 7d420666

Emanuele Aina authored 4 years ago and

Martyn Welch committed 4 years ago


Help maintainers spot failed packaging pipelines where manual
intervention is needed.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

7d420666

Aug 23, 2020

packaging-check-invariants: Catch missing updates · 7a5199ce

Emanuele Aina authored 4 years ago


Check that all the active channels ship the same version and are not
missing updates.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

7a5199ce

Jul 29, 2020

Index data by package name · 070001e5

Emanuele Aina authored 4 years ago and

Martyn Welch committed 4 years ago


Rather than indexing by repository name, use the package name as the
main key since it is the common concept that ties GitLab, OBS and
upstream sources.

This simplifies some parts of the code as all the information is
available from a single object instead of being spread across multiple
data sources.

Error reporting is also largely simplified by having a single `errors:`
array on each package and have each error to be an object rather than a
single string: iterating over every error is thus much simpler and the
information about the error itself is now explicit rather than implicit
based on its surrounding context (for instance, whether it was located
on a branch, on the git project, or on the OBS package entry).

The YAML structure went from:

    obs:
      packages:
        aalib:
          entries:
            apertis:v2020:target:
              name: aalib
              errors:
                - "ooops"
    projects:
      pkg/target/aalib:
        branches:
          debian/buster:
            name: debian/buster
            errors:
              - "eeeww"
        errors:
          - "aaargh"
    sources:
      debian/buster:
        packages:
          aalib: [...]

to:

    packages:
      aalib:
        obs:
          entries:
            apertis:v2020:target: {...}
        git:
          branches:
            debian/buster: {...}
        upstreams:
          debian/buster: [...]
        errors:
          - msg: "aaargh"
          - msg: "eeeww"
            branch: debian/buster
          - msg: "ooops"
            projects: [ "apertis:v2020:target" ]

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

070001e5

classes: Rename Project, Branch and Tag to clarify they are about git · cb04a701
Emanuele Aina authored 4 years ago and Martyn Welch committed 4 years ago
```
Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>
```
cb04a701

Jul 13, 2020

Check whether updates got landed where they should · 40dcc4cb

Emanuele Aina authored 4 years ago

Go through the `apertis/*` branches and complain if the `debian/*`
branches for their base distribution have not been merged.

This ensures that all the Debian updates get packaged for Apertis.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

40dcc4cb

packaging-data-fetch-downstream: Minor code semplification · 57962d57

Emanuele Aina authored 4 years ago


Use `list()` rather than a unneeded list comprehension.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

57962d57

classes: Move reusable dataclasses to their own module · fbe81a83
Emanuele Aina authored 4 years ago
```
Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>
```
fbe81a83
utils: Move reusable functions to their own module · aae96706
Emanuele Aina authored 4 years ago
```
Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>
```
aae96706

Jul 11, 2020

packaging-data-fetch-downstream: Fetch all branches and tags · 5841252b

Emanuele Aina authored 4 years ago


The Python GitLab API is paginated by default, so you may not get all
results unless you use the `all=True` parameter.

This affected the `firefox-esr` package: since it is frequently updated
in Debian, not all tags for the `debian/buster` were returned, and in
particular the one pointing to the branch head was missing leading the
sanity-check step to complain.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

5841252b

May 15, 2020

packaging: Gather data and trigger actions on packaging repositories · e7163690

Emanuele Aina authored 5 years ago


Introduce a pipeline to fetch data from multiple sources, cross-check
the retrieved information and trigger actions.

Each step emits YAML data that can be consumed by later steps and then
merged again to render a dashboard, with the goal of easing the addition
of more data sources and checks as much as possible.

The current steps are:
* packaging-data-fetch-upstream: grab package listings from the
  configured upstream sources
* packaging-data-fetch-downstream: scan GitLab to collect data about
  the packaging repositories and branches
* yaml-merge: dedicated tool to merge data from multiple sources
* packaging-sanity-check: verify some invariants and report mismatches
* packaging-updates: compute which packages have a newer upstream and
  trigger the pipeline to pull them in
* dashboard: render a basic dashboard listing the identified errors

By triggering only the pipelines where there's a known update pending
we avoid the issues with the previous approach that involved running
the pipeline on each of the 4000+ repositories every week, which ended
up overwhelming GitLab.

Signed-off-by: Emanuele Aina <emanuele.aina@collabora.com>

e7163690