Skip to content
Snippets Groups Projects
Commit 61800f63 authored by Walter Lozano's avatar Walter Lozano Committed by Peter Senna Tschudin
Browse files

Update automated license compliance


Update documentation about Automated License Compliance based on the current
development status reflecting some of the decision that have been made.

Signed-off-by: default avatarWalter Lozano <walter.lozano@collabora.com>
parent 3506c613
No related branches found
No related tags found
1 merge request!137Update automated license compliance
Pipeline #178383 passed
......@@ -33,46 +33,30 @@ Apertis will continue to evaluate licensing before the inclusion of source packa
By tracking licensing to this degree we can look to exclude components with unsatisfactory licensing from the packages intended for distributed target systems, whilst still packaging them separately so they may be utilized during development.
A good example of this situation is the `gcc` source package and the `libgcc1` binary package produced by it.
Unlike the other artifacts produced by the GCC source package, the libgcc1 binary package is not licensed under the stock GPLv3 license, a [run time exception]( {{< ref "license-exceptions.md#gcc8" >}} ) is provided and it is thus fine to ship it on target devices.
The level of tracking we are proposing will detect such situations and will offer a straight forward way to resolve them, maintaining compliance with the licensing requirements.
The level of tracking we are providing will detect such situations and will offer a straight forward way to resolve them, maintaining compliance with the licensing requirements.
To achieve this 2 main steps need to be taken:
- Record the licensing of the project source code, per file
- Determine the mapping between source code files and the binary/data files in each binary package
We recommend to integrate these steps into our CI pipelines to provide early detection of any change to the licensing status of each package. Extending our CI pipelines will also enable developers to learn about new issues and to solve them during the merge request development flow.
# License scanners
There are various proprietary and open source tools which can help with tracking the licensing terms that apply to the pieces of software from which Apertis is built.
The following tools are examples of those that can help to achieve the first of the steps outlined above:
- [Dependency-Track](https://dependencytrack.org/): Open source, higher level tool for presenting data from BOMs generated by other software.
- [FOSSA](https://fossa.com/): Proprietary suite for license compliance tracking and management.
- [FOSSID](https://fossid.com/): Proprietary license scanning tool.
- [FOSSology](https://www.fossology.org/): Open source, server based tool, utilizing a number of techniques to extract licensing information from source and binary artifacts.
- [Licensecheck](https://metacpan.org/pod/App::Licensecheck): A simple open source license checker.
- [Licensee](https://github.com/licensee/licensee): Open source tool, limited to scanning for license files.
- [Ninka](http://ninka.turingmachine.org/): Very limited, light weight, open source tool, developed as a research project aimed at identifying licenses in source code.
- [Protex](https://www.integrauae.com/datasheets/Protex_UL.pdf): Part of the Black Duck suite of proprietary tools for managing open source compliance.
- [ScanCode](https://www.nexb.com/): Suite of open source tools, which provide a foundation on which the company developing them provides it's proprietary enterprise solution.
- [WhiteSource](https://www.whitesourcesoftware.com/): Proprietary suite for open source component management.
Due to the open source nature of the Apertis project, we intend to utilize an open source tool for license compliance rather than a proprietary solution.
Given the traction, community, and Linux Foundation involvement, our suggestion of open source tool for license scanning is FOSSology.
These steps have been integrated into our CI pipelines to provide early detection of any change to the licensing status of each package. Extending our CI pipelines also enables developers to learn about new issues and to solve them during the merge request development flow.
## FOSSology
FOSSology is a server based tool which provides a web front-end that is able to scan through source code (and to a degree binaries) provided to it, finding license statements and texts.
FOSSology is an Open Source server based tool which provides a web front-end that is able to scan through source code (and to a degree binaries) provided to it, finding license statements and texts.
To achieve this FOSSology employs a number of different scanning techniques to identify potential licenses, including using matching to known license texts and keywords.
The scanning process errs on the side of caution, generating false positives over missing potential licensing information, as a result it will be necessary to "clear" the licenses that are found, deciding whether the matches are valid or not.
This is likely to be a very time consuming process, though bulk recognition of identical patterns may provide some efficiencies.
Once completed, FOSSology will record the licensing decisions and can apply this information to updated scans of the source.
The scanning and clear process during the first time is more time consuming and requires special attention, however, subsequent runs should be much faster as FOSSology is able to use previous decisions to find the license information.
Once completed, FOSSology records the licensing decisions and can apply this information to updated scans of the source.
It is anticipated that, after an initial round of verification, FOSSology will only require additional clearing of license information should the scan detect new sources of potential licensing information in an updated projects source or when new packages are added to Apertis.
It is possible to export and import reports which contain the licensing decisions that have previously been made, if a trusted source of reports can be found then these could also be imported, potentially reducing the work required.
FOSSology is backed by the Linux Foundation, it appears to have an active user and developer base and a significant history.
As such, it is felt that this tool is likely to be maintained for the foreseeable future and thus a good choice for integration into the Apertis workflow.
FOSSology is backed by the Linux Foundation, it appears to have an active user and developer base and a significant history and it is the de-facto Open Source Software solution for license compliance. As such, it is felt that this tool is likely to be maintained for the foreseeable future.
As this tool provides a web bases UI, it presents an additional advantage, as it makes it easier for non-technical users, such as auditors or lawyers, to access and manage the reports, allowing a smooth integration in an audit process.
For all the reasons mentioned above we understand this is the best choice for integration into the Apertis workflow.
## CI Pipeline integration
......@@ -80,46 +64,45 @@ In order to avoid manual tasks the license detection should be integrated into t
FOSSology provides a [REST API](https://www.fossology.org/get-started/basic-rest-api-calls/) to enable such integration.
FOSSology is able to consume branches of git repositories, thus allowing scanning of the given source code straight from GitLab.
It is suggested that this should be triggered after normal build checks have been successfully performed.
This process should be triggered after updating a package from external sources, as in this cases a license change can be introduced.
A report will be generated and retrieved, using the REST API, which describes (among other things) the licensing status of each file.
The report can be generated in a number of formats, including various SPDX flavors that are easily machine parsable, which will be the preferred option.
The report can be generated in a number of formats, including various SPDX flavors that are easily machine parsable, using [DEP5](https://dep-team.pages.debian.net/deps/dep5/) as the preferred option.
It is suggested that each component should require a determination of the licensing to have been made for every file in the project.
Due to the large volume of licensing matches that will result from the initial licensing scan, we recommend that the absence of license information initially generates a warning.
In some cases, to achieve the fine grained licensing information desired, the licensing of some files may need to be clarified with the components author(s).
Once an initial pass of all Apertis components had been made we would expect missing license information to result in an error, as such errors would be as a result of new matches being found, which would need to be resolved in FOSSology before CI would complete without an error.
The generated report should be saved in the Debian metadata archive so that it is available for the following processing.
The adoption of FOSSology will be gradual and in parallel with the current license scanning process in order to compare the results and improve the workflow. Once the process is fully reviewed and tested with all the packages in the target repository FOSSology will be the default scanner.
# Binary to source file mapping
Now that we have a way to determine the licensing of the source files, we need a way to determine which of these source files were used in each binary.
Binaries are built from many different source files, but the exact list of them depends on build options. For this reason a reliable mechanism needs to be put in place to extract this list after the build process in order to determine the license information.
Compilers store information in the binaries it outputs, that can be used by a debugger to pause execution of a process at a point corresponding to a selected line of source code.
This information provides a mapping between the lines of source code and the compiled machine code operations.
Executable binaries in Linux are generally stored in the [Executable and Linkable Format](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) (ELF), the associated [DWARF](https://en.wikipedia.org/wiki/DWARF) debugging data format is generally used to store this debugging information inside the ELF in specific "debug" sections.
By parsing this information, the source files that were used to generate each binary can be determined.
Combining this with the licensing information provided in the licensing report, a mapping can be made between each binary and it's associated licenses.
The tool `dwarf2sources` parses this information and extracts the name of the source files that were used to generate each binary, generating a `json` file that can easily be parsed later. Combining this with the licensing information provided in the licensing report, a mapping can be made between each binary and it's associated licenses.
## CI Pipeline integration
Apertis uses the Open Build Service (OBS) platform to build the binary packages in a controlled manner across several architectures and releases.
OBS utilizes `dpkg-buildpackage` behind the scenes to build each package.
This utility will have access to the source licensing report as it is contained in the Debian metadata archive.
This utility has access to the source licensing report as it is contained in the Debian metadata archive.
As well as the source licensing, the Debian metadata archive contains configuration to help `dpkg-buildpackage` determine how to build the source.
This is typically done with the help of [`debhelper`](https://manpages.debian.org/jessie/debhelper/debhelper.7.en.html), which provides helpers that simplify this process.
We plan to extend `debhelper` to include a command to perform the mapping between the binary files produced by the build and the license of the associated source files, using the process laid out above, and recording this for each of the binary packages to be made.
In addition, this helper should record the licensing attached to any other files that will be packaged as well.
Typically the binaries are striped (using a debhelper command called `dh_strip`) prior to packaging, removing the debug symbols from the binary and reducing its size.
We suggest that it would be easier to perform the license mapping prior to this step.
Whilst the debug symbols are kept, packaged separately in the `dbgsym` package, it's easier to perform the mapping before this is done.
A report should be saved in each binary package covering the files shipped in that package.
The report should be saved in `/usr/share/doc/<package>/`in a machine parsable SPDX format.
The new debhelper command will need to be added to the build rules for each package.
Whilst most packages make use of debhelper, many do so via higher level helpers that factor out common functionality, such as `dh` and `CDBS` and this will add complexity to this task.
There may be packages in Apertis that do not make use of debhelper, these packages will need special handling to ensure that the required steps are completed.
Apertis extended `debhelper` by including a new command `dh_dwarf2sources` to perform the source file name extraction using `dwarf2sources` as described above. Typically the binaries are striped (using a debhelper command called `dh_strip`) prior to packaging, removing the debug symbols from the binary and reducing its size. For this reason `dh_dwarf2sources` is placed before this step in the `dh` sequence. Whilst the debug symbols are kept, packaged separately in the `dbgsym` package, it's easier to perform the mapping before this is done. The result is stored in the binary package under `/usr/share/doc/<package>/`.
Following this same idea, Apertis also extends `debhelper` command `dh_installdocs` to install the license report generated by FOSSolgy in the binary under `/usr/share/doc/<package>/copyright_report`.
Despite that this solution should work for most packages, some of them might need special handing as may override default rules. These special cases will be covered with further improvements.
There may be packages in Apertis that do not make use of debhelper, these packages need special handling to ensure that the required steps are completed.
As these reports are provided by each binary package, the reports from installed packages can be accessed at image build time and amalgamated into an image wide report at that point should it be required.
As a binary can be built from multiple sources, each with differing licenses, it will be necessary for the report to detail each file that is used to create each binary and the licensing under which it is provided.
As a binary can be built from multiple sources, each with differing licenses, it is necessary for the report to detail each file that is used to create each binary and the licensing under which it is provided.
In some circumstances dual licensed source code may allow for a binary to be effectively licensed under the terms of a single license, that is the user has the option to pick a license that results in the whole binary being able to be provided under the terms of a single license.
Where dual licensed source code isn't used, the terms of all applicable licenses should be declared.
The terms of the various licenses may be considered [compatible](https://en.wikipedia.org/wiki/License_compatibility), allowing the binary to effectively be managed under the terms of the more restrictive license.
......@@ -130,13 +113,11 @@ Not all possible combinations of licenses work out this way and thus why it is i
The approach each project using Apertis takes with regards to the reporting of licensing information should be driven by how this information is to be utilized, i.e. some projects may wish to parse the license information and present it in a single BOM file in HTML, XML or human readable text.
For the images provided by the Apertis project, we plan to combine the reports saved in `/usr/share/doc/<package>/` into a single parsable file.
Should it be required to provide some tool with which to interrogate the licensing which applies to the binary packages, the SPDX files can be imported into FOSSology.
For the images provided by the Apertis project, the script `generate_bom.py` combines the reports saved in `/usr/share/doc/<package>/`, which consists in a `json` per package and a `DEP5` file per source package into a single `json` file which is provided with the image. This file can be generated with different levels of verbosity allowing to list licenses per image, package, binary or source file.
This same scripts also issues a warning in case a problematic license is found.
## CI Pipeline integration
Apertis utilizes [Debos](https://github.com/go-debos/debos) in its image generation pipeline.
There is an existing tool available for the merging of SPDX documents.
The generation of a combined BOM can be realized by utilizing this tool in a script to be run at the appropriate time during the image build process by integrating the script into the Debos recipes.
Integrating scripts into the Debos recipes is an approach we have taken when generating the list of installed packages and list of files.
It reduces the overhead and potential complexity of decompressing and mounting the images that would be necessary should the BOM be generated in a separate step.
Apertis utilizes [Debos](https://github.com/go-debos/debos) in its image generation pipeline, which provides a very versatile way of customizing them. During the final stage of the image creation, the script `generate_bom.py` is used to build the BOM file with the license information of the image and export it as an additional artifact.
Finally as both `minimal` and `target`images should not shipped extra data, the contents of `/usr/share/doc/` are dropped from the image.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment