Skip to content
Snippets Groups Projects
Commit 0c2ab442 authored by Emanuele Aina's avatar Emanuele Aina
Browse files

ci-license-scan: A control char is gibberish enough


Assume that any string containing a control character is gibberish.

Rely on the `unicodedata` module to know which character is a control
character and what is not.

This addresses some bad interactions with control characters ending up
in the diff output, like u+0098 aka Start-Of-String (SOS) preventing the
output to be displayed on terminal emulators like xterm.

Signed-off-by: Emanuele Aina's avatarEmanuele Aina <emanuele.aina@collabora.com>
parent 35734703
No related branches found
No related tags found
2 merge requests!122ci-license-scan: A control char is gibberish enough,!93WIP: documentation-builder: Rebase on Apertis instead of Debian Buster
Pipeline #146910 passed
......@@ -15,6 +15,7 @@ import sh
import sys
import yaml
import textwrap
import unicodedata
# this is necessary to eliminate references in the generated YAML
# Perl tools use YAML::Tiny which doesn’t support references.
......@@ -226,6 +227,8 @@ def is_gibberish(s: str) -> bool:
False
>>> is_gibberish("2000 李健秋")
False
>>> is_gibberish('Æf-w¬6f*äIy,2bÓñ.\x982§VS')
True
>>> is_gibberish('ÿy\x8d\x8aÿt}kÿoiEÿiP')
True
>>> is_gibberish('f%^2<')
......@@ -235,6 +238,8 @@ def is_gibberish(s: str) -> bool:
# See https://phabricator.apertis.org/T6677 for details
if len(s) < 6:
return True
if any(unicodedata.category(c)[0] == 'C' for c in s):
return True
gibberishness = sum(1 if (c >= '\x7b') and (c <= '\xff') else 0 for c in s)
if len(s) < 16 and s.count(' ') == 0 and gibberishness >= 6:
return True
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment