Skip to content
Snippets Groups Projects
Commit 7f2db982 authored by Martyn Welch's avatar Martyn Welch Committed by Martyn Welch
Browse files

Remove license validation document as duplicate


The license-validation document covers the same ground as the first
portion of the automated license compliance documentation. Drop it to save
duplication of information.

Signed-off-by: default avatarMartyn Welch <martyn.welch@collabora.com>
parent 401de90d
No related branches found
No related tags found
1 merge request!66Evaluate designs
...@@ -5,6 +5,10 @@ weight = 100 ...@@ -5,6 +5,10 @@ weight = 100
aliases = [ aliases = [
"/old-designs/latest/automated-license-compliance.html", "/old-designs/latest/automated-license-compliance.html",
"/old-designs/v2021dev3/automated-license-compliance.html", "/old-designs/v2021dev3/automated-license-compliance.html",
"/old-designs/latest/license-validation.html",
"/old-designs/v2019/license-validation.html",
"/old-designs/v2020/license-validation.html",
"/old-designs/v2021dev3/license-validation.html",
] ]
outputs = [ "html", "pdf-in",] outputs = [ "html", "pdf-in",]
date = "2020-06-05" date = "2020-06-05"
......
+++
title = "License validation"
short-description = "Design proposal for source licenses validation"
weight = 100
aliases = [
"/old-designs/latest/license-validation.html",
"/old-designs/v2019/license-validation.html",
"/old-designs/v2020/license-validation.html",
"/old-designs/v2021dev3/license-validation.html",
]
outputs = [ "html", "pdf-in",]
date = "2019-09-27"
+++
# License validation
## Scope
The scope of this document is to describe a suitable system to deal with
license requirements and compliance validation.
## Terminology and concepts
### Agent
Software component responsible for the extraction of licensing information from
source packages
### Copyright
Legal right created by the law of a country that grants the creator of an
original work exclusive rights for its use and distribution
### License
Legal instrument (usually by way of contract law, with or without printed
material) governing the use or redistribution of software
### Ninka
Standalone license scanner that can also be used as FOSSology agent
### Nomos
FOSSology agent license scanner
### OBS
Open Build Service
### OSS
Open Source Software
## Tools under review
### Generic license check tools
The tools listed below allow users to extract licensing information by scanning
source code. They can operate at different levels of granularity, from a single
source code file, to source tar packages, to ISO images containing source
packages.
These tools are not tied to any specific distribution and are focused on Open
Source licenses.
#### FOSSology
[FOSSology](https://www.fossology.org/) is a framework, a toolbox and web
application for examining software packages in a multi-user environment.
From the web application or using web API with CLI, a user can upload individual
files or entire software packages to be scanned. FOSSology then will unpack the
uploaded data if necessary and run a chosen set of agents on every extracted
file.
FOSSology framework currently focuses on licensing checks, but it could be used
in combination with agents aimed at doing different kinds of tasks such static
code analysis.
In particular, its current toolkit can run licensing, copyright and export
control scans from the command line.
The web application adds a web UI and a database to provide a compliance
workflow. In one click it can generate a SPDX file, or a ReadMe with the
copyrights notices from shipped software.
FOSSology also deduplicates the entries to be analized, which means that it can
scan an entire distribution and when a new version is submitted only the files
that actually changed will get rescanned.
FOSSology has many different interesting features:
* Regular expression scanning for licenses with Nomos
* Text-similarity matching with Monk
* Copyrights search
* Export Control Codes (ECC)
* Bucket processing
* License reviewing
* License text management
* Mark a license as main license of a software package
* Bulk recognition. Text phrase scan to identify files with similar license
contents that are recurring across multiple files
* Aggregated file view
* Reuse of license reviews
* Export information in different formats:
* Readme files for the distribution containing all identified license texts and
copyright information
* List of files in hierarchical structure with found licenses identified by
the short name identifier
* SPDX 2.0 export using the tag-value and the RDF-(XML)-format
* Debian-copyright (a.k.a. DEP5) files
Backend tools and scanners are written in C/C++ and the frontend web application
is implemented with PHP.
#### Ninka
[Ninka](http://ninka.turingmachine.org/)
[source](https://github.com/dmgerman/ninka) is a lightweight license
identification tool for source code. It is sentence-based, and provides a simple
way to identify open source licenses in a source code file. It is capable of
identifying several dozen different licenses (and their variations).
Ninka has been designed with the following design goals:
* To be lightweight
* To be fast
* To avoid making errors
FOSSology has recently added support for Ninka as agent.
It is mainly written in Perl.
#### scancode-toolkit
[scancode-toolkit](https://github.com/nexB/scancode-toolkit/) scans code and
detects licenses, copyrights, packages manifests and dependencies. It is used to
discover and inventory Open Source and third-party packages used in projects and
can generate SPDX documents.
Given a codebase in a directory, scancode will:
* Collect an inventory of the code files and classify the code using file types
* Extract files from any archive using a general purpose extractor
* Extract texts from binary files if needed
* Use an extensible rules engine to detect open source license text and notices
* Use a specialized parser to capture copyright statements
* Identify packaged code and collect metadata from packages
* Report the results in your choice of JSON or HTML for integration with other
tools
* Display the results in a local HTML browser application to assist your analysis
ScanCode is written in Python and also uses other open source packages.
#### licensed
[licensed](https://github.com/github/licensed) has been recently released by
GitHub to check the licenses of the dependencies of a project.
Modern language package managers (bower, bundler, cabal, go, npm, stack) are
used to pull the dependency chain of a specific project.
Licenses can be configured to be either accepted or rejected, easing the
developer task of identifying problematic dependencies when importing a new
third-party library.
### Debian centric license check tools
Tools below focus on Debian-derived environments, and work with
[DEP5](http://dep.debian.net/deps/dep5/) `debian/copyright` file format and/or
Debian packages.
#### licensecheck
[licensecheck](https://metacpan.org/pod/App::Licensecheck) scans source code and
reports found copyright holders and known licenses. Its approach is to detect
licenses with a dataset (medium:~200 regexes) of regex patterns and key phrases
(parts) and to reassemble these in detected licenses based on rules. In that
sense this is somewhat similar to the combined approaches of FOSSology/nomos and
Ninka. It also detects copyright statements. It output results in plain text
(with customizable delimiter) or a Debian copyright file format. Written in
Perl.
Auto generating a `debian/copyright` can be easily accomplished by:
```
licensecheck --copyright -r `find * -type f` | \
/usr/lib/cdbs/licensecheck2dep5 > debian/copyright.auto
```
#### debmake
[debmake](https://anonscm.debian.org/cgit/collab-maint/debmake.git) is a program
helper to generate Debian packages, which contains options for checking
copyright+license (-c) and compare `debian/copyright against current sources and
exit (-k). Written in Python.
Auto generating a `debian/copyright` can be easily accomplished by:
```
debmake -cc > debian/copyright
```
Compare new sources against upstream new sources:
```
debmake -k
```
It focus on license types and file matching, and is able to detect ineffective
blocks in the copyright file.
It is buggy due to faulty unicode handling.
#### license-reconcile
An alternative for comparison of `debian/copyright` versus current source tree
is also provided by
[license-reconcile](https://anonscm.debian.org/cgit/pkg-perl/packages/license-reconcile.git).
It reports missing copyright holders and years, but during testing it was
confused by inconsistent license names.
`license-reconcile` attempts to match license and copyright information in a
directory with the information available in `debian/copyright`. It gets most of
its data from `licensecheck` so should produce something worth looking at out of
the box. However for a given package it can be configured to succeed in a known
good state, so that if on subsequent upstream updates it fails, it points out
what needs looking at.
It can be particularly useful once a package has been configured to make it succeed,
so that any failure on subsequent upstream updates can be used to pay attention
to licensing changes that must be acknowledged.
#### cme
[cme](https://metacpan.org/release/App-Cme) option is based on a config parsing
library.
```
cme update dpkg-copyright
```
This will create or update `debian/copyright`. The cme tool seem to handle UTF-8
names better than debmake. Written in Perl, using licensecheck.
#### elbe-parselicense
[elbe-parselicense](https://elbe-rfs.org/docs/sphinx/releases_v1.9.24/elbe-parselicence.html)
generates a file containing the licences of the packages included in a project.
#### dlt
[dlt](https://github.com/agustinhenze/dlt/) has support for parsing and creating
Debian machine readable copyright files. Written in Python.
## Recommended tools
Most of the tools discussed in the previous section are very useful in a way or
the other and some build on top of others. For the Apertis use case, it is
advisable to use some tool which already provides a framework to deal with
licenses and copyrights. The other tools can be hooked in different processes
for particular use cases, if those are needed, or those can be used to double or
triple check the output from other tools, if desireed. A good starting point is
FOSSology, which already provides a database and keeps track of licenses and
copyrights, it supports SPDX
and DEP5 output formats and its architecture is easily extendable via
plugins. Therefore this proposal recommends to use FOSSology as a start. After
initial setup is accomplished and workflow defined, it can be fine tuned
considering the other tools or extending FOSSology with such support.
## Integration with current tools
In the current Apertis CI infrastructure, there are several stages:
* Phabricator (`code review`) - source code review system
* Jenkins (`buildpackage CI`) - CI build per source package code changes
* Open Build Service (`distro`) - contains all the distribution packages
* Jenkins (`images`) - builds images from distributed package repository pools
* LAVA (`testing`) - manages automated tests for different set of images
* Phabricator (`bugtracker`) - keeps track of image defects
As initial step, it looks plausible to hook FOSSology after a new source package
is added or updated in Open Build Service. That way FOSSology database should
contain all needed data regarding licenses and copyrights and it can be queried
to extract information when needed.
## Approach
The following proposal outlines the way FOSSology is meant to interact with other parts of system.
![](/images/apertis-license-validation-infra.svg)
Inputs
* FOSSology server will be fed with source code tarballs from repositories, starting by adding packages which conform the target runtime into FOSSology bucket.
* A list of software packages that conform target image runtime will be provided to FOSSology.
Deliverable
* A SPDX and/or DEP5 license report of software packages found in the target runtime image.
Every release should have a license report
WIP:
Setup
Configuration
Clearing licenses
Rules setup
Day to day operation
Notifications
Generating a report
TBD: FOSSology manual workflow for clearing licenses
## References
[Machine-readable debian/copyright file](http://dep.debian.net/deps/dep5/)
[Creating, updating and checking debian/copyright semi-automatically](http://people.skolelinux.org/pere/blog/Creating__updating_and_checking_debian_copyright_semi_automatically.html)
[debmake -- checking source against DEP-5 copyright](http://goofying-with-debian.blogspot.com/2014/07/debmake-checking-source-against-dep-5.html)
[Improving creation of debian copyright file](https://ddumont.wordpress.com/2015/04/05/improving-creation-of-debian-copyright-file/)
[scancode-toolkit wiki](https://github.com/nexB/scancode-toolkit/wiki)
[Mozilla's Fossology investigation](https://wiki.mozilla.org/Fossology)
source diff could not be displayed: it is too large. Options to address this: view the blob.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment