Description
I am not sure if here is a good place of if it would be better to open a topic on https://discuss.scientific-python.org/. Happy to move it somewhere else if that isn't the best venue.
I am under the impression that most projects that are part of scientific-python and/or pydata ecosystem follow numpydoc without significant deviations. What varies I think is more the level of success especially if you move outside the core projects. I think one of the reasons for that is that the numpydoc style guide is not a comprehensive collection of all the rules in convention that are actually in use throughout the ecosystem (or even within numpy itself). I think missing conventions are more common, but some of the guidance is also not used or realistic. I am also not sure about how extended or de facto part of numpydoc or numpy/scipy codebase some of these conventions are which makes it hard to know if these are indeed conventions that should be in numpydoc or things that look the same by chance/contributor overlap. I'll add some examples below, but the main question is:
Would an effort to check the current style guide and find common ground with other projects be welcome?
Context: I have been trying for a while to improve the documentation of ArviZ and PyMC libraries and trying to follow numpydoc and rest of conventions used in the pydata ecosystem. I would love to simply link numpydoc, not write a whole new doc extending/reprating numpydoc (with the exception of 2-3 paragraphs at most about project specific htings like aliases defined in numpydoc_xref_aliases
) and forcing contributors to navigate multiple docs to write docstrings. I also think that is numpydoc's goal, not to be only numpy/scipy specific, but I am not sure about this and if so it would be great to review it and to make sure it is up to date and as comprehensive as possible. I have added examples that I have gathered during that process (some of them below), but as I said in the beginning, the goal is not so much about the specific examples but about making sure I am on the right page about numpydoc's goals and scope and if so we can find the best way to go about it.
Some examples
Short summary
Its description is:
A one-line summary that does not use variable names or the function name
which is followed by an example using function add(a, b)
described as "The sum of two numbers.".
The not use of variable names is generally followed, not using the function name however is not used and I don't think is realistic. e.g. most of the short summary sections in the linear algebra module do use the function name, virtually all methods in the random generator class have a short summary that is "Draw samples from the distribution.". And not using the function/method name would probably be less clear than using it, they refer to technical terms without any synonym available, and they can't be defined in the short summary either.
Application/scope of the doc
I was able to find project specific docstring convention pages for pandas or matplotlib but not for numpy nor scipy, they only had links to numpydoc. Am I right to assume that numpydoc alone is the official convention description for all of numpy and scipy?
There are a couple places where only numpy is mentioned such as the deprecation warning section, but most importantly, there also seems to be numpy infrastructure information (I think) in the style guide like the last couple sentences in the examples section. If I understand correctly, omitting the numpy import is only an option in numpy but not scipy docstrings for example, but I don't really know what to make of the auto-use of the plot directive when matplotlib is imported, is that part of numpydoc?
The parameter section also says "Enclose variables in single backticks" but the behaviour of single backtick enclosure in sphinx is defined by the value of default_role
in conf.py
. Is that intended? In general this will default to italics, but it would become code formatting for projects using "code" as default role and what might be even worse, could easily be rendered as links to the glossary if using "autolink" as default role. e.g. axis
is a common parameter name and a term in numpy's glossary which is probably added to intersphinx by most projects and therefore a valid autolink key; that would mean that in the descriptions, all parameters would be rendered as italics except the axis one that would be a link (I have no background, but the fact that all parameters in https://numpy.org/devdocs/reference/generated/numpy.average.html are correctly enclosed within single backticks except for axis
might point to this).
Parameters
The array_like
alias is used in passing only in the parameters section and explained at the very bottom of the page in the "other points to keep in mind". However, this "type" is key to many docstrings and it seems to have multiple extra conventions associated to it (shape, dtype) which are not documented in the parameters nor other points sections. The page in matplotlib's docs explains some of these conventions which seem to be the ones as used by numpy (in the linalg module for example, the input of the cholesky function is described as a : (…, M, M) array_like
). Is this something that should be part of numpydoc or something that is technically matplotlib specific and happened due to maintainer overlap or something like this? Other modules like the polynomial use slightly different convention: array_like, shape (M,)
, are/should both be part of numpydoc? a bit like the multiple default
options allowed
It might also help to add further type examples inside numpydoc itself (maybe they can go inside a dropdown to not take too much space. numpydoc says:
For the parameter types, be as precise as possible.
but there are cases I am not sure what should be the way to go when writing a docstring or don't know what to answer when someone else asks me about it. pandas for example has some extra examples that have been helpful in such situations:
list of int
dict of {str : int}
tuple of (str, int, int)
tuple of (str,)
set of str
and so does matplotlib, with some extra conventions too such as "Use (float, float)
to [...] the parentheses should be included to make the tuple-ness more obvious.". Which is also used for example in numpy.histogram to describe the range parameter. Do you know if this parentheses -> tuple-ness link is more extended in numpy/scipy docs? Should it be part of numpydoc or a point of divergence by matplotlib? Is the tuple of (float, float)
more common?