Define syntax and format of REUSE.yaml #81
Description
As discussed in spdx/spdx-spec#502, the SPDX project plans to support a "metadata, pre-document file" that contains specific information about files relative to its position. This follows a request to implement something called REUSE.yaml, first discussed here. This issue is to discuss the exact format and syntax of the file.
Proposed YAML options
In the original discussion, we proposed four different syntaxes. One of them (also disliked by the REUSE team) has been turned down in a SPDX call. I removed two others as they are rather unintuitive and clumsy. Also, I changed the format a bit to comply with the YAML syntax (using *
as key name is invalid), and added another option.
Option 1: list
Each list item is a SPDX tag as used in file headers. Easy to read thanks to the -
, but all items must be wrapped in "
to escape the :
which would separate a key from a value – we cannot have multiple keys!
- files: "src/*"
info:
- "SPDX-FileCopyrightText: 2020 Me"
- "SPDX-FileCopyrightText: © 2017 You"
- "SPDX-License-Identifier: MIT"
Option 2: multi-line string
SPDX tags are just separated by new lines. No -
or escaping of :
are required. However, indentation must be preserved for all lines!
- files: "src/*"
info: |
SPDX-FileCopyrightText: 2020 Me
SPDX-FileCopyrightText: © 2017 You
SPDX-License-Identifier: MIT
Option 3: license and copyright as separate keys
We could also separate the two information items. Downside: the keys must be wrapped in "
to escape the -
in the key name.
- files: "src/*"
"SPDX-FileCopyrightText":
- "2020 Me"
- "© 2017 You"
"SPDX-License-Identifier": MIT
Background on the YAML keys
Unlike the SPDX YAML format, we would like to avoid copyrightText
and licenseDeclared
as key names. In REUSE, the SPDX-License-Identifier
and SPDX-FileCopyrightText
(or alternatively traditional, varying copyright statements) are common and understood by the users.
This was also accepted in the SPDX call.
Possible targets
REUSE.yaml is intended to target files that are relative to its position, and only those that are "below".
Statements like files: "../../src/*"
should not be possible.
Supporting traditional copyright statements?
A related question is whether we should only support SPDX-FileCopyrightText
as indicator for files' copyright, or also "traditional" statements like "Copyright © 2021 Jane Doe".
REUSE recommends the SPDX tag, but also supports the traditional statements. My suggestion would be to do the same in REUSE.yaml to reduce friction, but in SPDX this could lead to conflicts. Happy to collect opinions here!
Globbing
DEP-5 uses a simple glob syntax. In this, */Makefile
would include any Makefile in all paths below. I am not sure whether this globbing is represented in any native Python module. The benefit of sticking with the DEP-5 glob is that we could more easily convert existing DEP-5 files to REUSE.yaml.
Another possibility would be using the Python-native glob. */Makefile
would only match a Makefile in one level below, while **/Makefile
would match all Makefiles.
We could also use pathspec, supporting the same globbing as gitignore
.
Conflict resolution
As in DEP-5, I would suggest that the last match of a file wins. So if the file foo.txt
is first matched by *
and then *.txt
, the last statement would count.
The dependecy resolution within REUSE and its different options – including REUSE.yaml – is discussed in #70.