Draft: scripts/generate_bom.py: Report ambiguous copyright
That's the most trickiest, because if parts of the path can be missing in
f_src
fromdwarf2sources
, then any paragraph with a wildcard can match allf_src
.In other words, if you can't trust and make any assumptions about the path to the file, then any wildcard (even with a path like
src/lib2/*
) can be matched.In your second example,
feature/list.c
can match the 3 wildcards. Even if we had alib1/list.c
, we can't trust the path: there could be asrc/lib2/lib1/list.c
orsrc/lib1/lib1/list.c
orsrc/lib1/list.c
.We could add some heuristics or try to match parts of the path but that's unreliable and becomes quite heavy when we could just drop wildcards everywhere and be way safer
Let me clarify a bit, since I just mentioned some potential issues without adding suggestions.
First, I think that you are in the right track just need to cover some additional cases and have a clear approach based on some facts:
1- Does copyright report contains wildcards? 2- Were we able to calculate a valid prefix that always match? This points to the fact that the debug info is "nice"
If 1 is true and 2 is true -> we are safe If 1 is true and 2 is false -> we are in a kind of ambiguity, we should switch to full report to have more info If 1 is false and 2 is true -> we are safe If 1 is false and 2 is false -> we need to investigate
Please take into account as mentioned in !479 (comment 56398) that maybe there are just a few cases where things are really tricky, so as I suggested, there first thing is to analyze those ones.
As mentioned, the idea with this changes is to detect potential ambiguities and report them to try to switch to full report to reduce them.
Probably we would like to rise a warning/error in
check_bom.py
.