Description
Historically, there was no validation on how docstrings were written. Some conventions were usually followed, but as the project grew, it was more difficult to ensure that all the API documentation pages are consistent, and don't have mistakes.
For the last two years, we've been implementing all sorts of validations to make sure every class, method, function and attribute is correctly documented.
The list of validations can be found here in the script that validates them: https://github.com/pandas-dev/pandas/blob/master/scripts/validate_docstrings.py#L77
Many of them have already been fixed in all the pages, and they could be added to the CI so they are not reintroduced again. The list of errors currently validated can be seen at the CI script: https://github.com/pandas-dev/pandas/blob/master/ci/code_checks.sh#L267
The list of pending errors making the difference is:
{'ES01': 'No extended summary found',
'EX01': 'No examples section found',
'EX02': 'Examples do not pass tests:\n{doctest_log}',
'EX03': 'flake8 error: {error_code} {error_message}{times_happening}',
'GL01': 'Docstring text (summary) should start in the line immediately after '
'the opening quotes (not in the same line, or leaving a blank line in '
'between)',
'GL02': 'Closing quotes should be placed in the line after the last text in '
'the docstring (do not close the quotes in the same line as the text, '
'or leave a blank line between the last text and the quotes)',
'GL08': 'The object does not have a docstring',
'PR01': 'Parameters {missing_params} not documented',
'PR02': 'Unknown parameters {unknown_params}',
'PR06': 'Parameter "{param_name}" type should use "{right_type}" instead of '
'"{wrong_type}"',
'PR07': 'Parameter "{param_name}" has no description',
'PR08': 'Parameter "{param_name}" description should start with a capital '
'letter',
'PR09': 'Parameter "{param_name}" description should finish with "."',
'RT02': 'The first line of the Returns section should contain only the type, '
'unless multiple values are being returned',
'RT03': 'Return value has no description',
'SA01': 'See Also section not found',
'SA02': 'Missing period at end of description for See Also "{reference_name}" '
'reference',
'SA03': 'Description should be capitalized for See Also "{reference_name}" '
'reference',
'SA04': 'Missing description for See Also "{reference_name}" reference',
'SS01': 'No summary found (a short summary in a single line should be present '
'at the beginning of the docstring)',
'SS02': 'Summary does not start with a capital letter',
'SS03': 'Summary does not end with a period',
'SS06': 'Summary should fit in a single line',
'YD01': 'No Yields section found'}
Some of them makes more sense to work when fixing the content of an object (like adding the description, or objects that simply don't have any documentation).
But some of them are just formatting errors, those are the ones I'd start with:
- EX03: flake8 error: {error_code} {error_message}{times_happening}
- GL01/GL02: Docstring text (summary) should start/end in the line immediately after the opening quotes (not in the same line, or leaving a blank line in between)
- PR02: Unknown parameters {unknown_params}
- PR06: Parameter "{param_name}" type should use "{right_type}" instead of "{wrong_type}"
- PR08: Parameter "{param_name}" description should start with a capital letter
- PR09: Parameter "{param_name}" description should finish with "."
- RT02: The first line of the Returns section should contain only the type, unless multiple values are being returned
- SA02: Missing period at end of description for See Also "{reference_name}" reference
- SA03: Description should be capitalized for See Also "{reference_name}" 'reference'
- SS02/SS03: Summary does not start/end with a capital letter
- SS03: Summary does not end with a period
- SS06: Summary should fit in a single line
To find errors for one of them you can use:
./scripts/validate_docstrings.py --errors=EX02
Or for errors that makes sense to address together:
./scripts/validate_docstrings.py --errors=GL01,GL02
This should give the list of errors to fix. We've got a list of steps to follow when fixing a docstring that it can be useful to you at: https://python-sprints.github.io/pandas/dashboard.html
VERY IMPORTANT
The main challenge will be not repeating the same work as other sprinters, which is very frustrating, and happened massively at every sprint. My recommendation is BEFORE doing any work, to create an issue for the error code you plan to work on (check that it hasn't already been created). In the error write the list of errors that validate_docstrings.py
returns. Then in a comment, take 10 of them, and write that you're going to fix them. Other people can work on a different 10. When opening a PR, reference the issue.
I created an issue for reference: #27976
Good luck!