Documentation
Documentation is an essential part of any software project. It is the way to communicate with potential users and contributors, and to ensure that the project is sustainable in the long term.
R users
DESCRIPTION file
For your entire project, you will need a DESCRIPTION file which gather the project metadata, for instance:
> Package: mypackage
> Title: What the Package Does (One Line, Title Case) \
> Version: 0.0.0.1000 \
> Authors@R:
person("First", "Last", , "first.last@example.com", role = c("aut", "cre"),
comment = c(ORCID = "YOUR-ORCID-ID")) \
> Description: What the package does (one paragraph). \
> Imports: Rpackage1, Rpackage2 (the list of R packages that are needed to run your analysis)
Some of these sections may be edited by hand, but others are automatically generated by devtools
or usethis
packages.
Function documentation: basics
- What is needed in the function documentation?
- what does your function do
- with which arguments
- what does it return
- (maybe) some examples of how to use it
- Here is an example of header for the custom ‘add’ function:
#' Add together two numbers
#'
#' @param x A number.
#' @param y A number.
#' @returns A numeric vector.
#' @examples
#' add(1, 1)
#' add(10, 1)
<- function(x, y) {
add + y
x }
You can add many options to your documentation, such as: - @export
to make the function available to the user - @importFrom
to import a function from a package - @seealso
to refer to other functions
Write both function and documentation at the same time in
my-function.R
file, stored in R sub-repository.Use
roxygen
to generateman/my-function.Rd
, reading the header: thedevtools
functiondocument()
::document() devtools
will generate (or update) your package’s .Rd
files
Package documentation
For a more “integrated” documentation of your package, that details the functions, datasets, and other objects in your package, you can use vignettes that can generate webpages with interactive code, results, plots and comments, and pkgdown to create a website for your package.
Also see CI/CI page to automate vignette and website publishing.
Python users
README.md
file
This is the main documentation file for your project. It is located at the root of the project and should contain a general description of the project, its purpose, and how to use it. This is the first thing that users will see when they visit your project on GitHub or Gitlab (or wherever you host your code).
Here is a list of things that you should include in your README.md
file:
- Name of the project / package. Idealy, it should match the name of the repository.
- Badges: These are small images that show the status and the quality of your project. It is especialy usefull if you want to distribute your project / package to users. For example, you can add a badge that shows :
- the build status of the project :
- the build of the documentation :
- the version of the package on Pypi:
or on Conda:
- and many more…
- the build status of the project :
- Description: A short description of the project / package. 1-3 sentences is generaly enough. Just enough to give an idea of what the project is about, and generaly not too technical.
- Installation: How to install the package. This should include the command to install the package using
pip
orconda
, and any other dependencies that need to be installed. - Usage: How to use the package. This should include an example of the most basic use case of the package.
- Links: Links to the documentation, tutorials, the issue tracker, the source code, the license, etc.
- Contributing: How to contribute to the project. This should include information on how to report bugs, how to request new features, and how to submit code changes and how to setup the development environment.
- Citation: How to cite the project.
Documentation of API
API
stands for Application Programming Interface. It can refer to functions, classes, or modules in your package, that create a user interface to your code. The documentation of the API is essential for users to understand how to use your package.
Docstrings
What is it and how to write it?
In Python, the documentation is written in a docstring
: a string that is the first statement in a module, function, class, or method, embedded within """
(triple double-quotes). The docstring should describe what the function does, what arguments it takes and their types (i.e. strings
, bool
, etc…), and what it returns. This docstring
is then used by the help()
function, and by the pydoc
module to generate documentation.
You need to consistently write docstrings
for all the functions, classes, and modules in your package.
There are several conventions for writing docstrings
in Python. The most common ones are: - Google style - Numpy style - reStructuredText
Example
Here is an example of a function with a docstring
:
def add(x: int, y: int) -> int:
"""
Add together two numbers.
Parameters
----------
x : int
A number.
y : int
A number.
Returns
-------
int
A numeric vector.
Examples
--------
>>> add(1, 1)
2
>>> add(10, 1)
11
"""
return x + y
This simple function simply adds two numbers together. The docstring
provides: - a description of what the function does - the inputs / parameters of the function and their types. - the output of the function and its type - a simple example of how to use the function. Note that theses exemples can be executed using the doctest
module, hence providing another nice way to test the function. The lines that need to be executed are preceded by >>>
.
Note:
You can see in the definition of the function that the arguments have “type hints” (i.e.
x: int
). This is not mandatory, but it is a good practice to add type hints to your functions, as it adds another layer of documentation and it makes the code more readable and helps catch bugs early. You can further describe the return type of the function using the->
operator (i.e.-> int
). The type hints are not enforced by Python, but they can be checked using a static type checker likemypy
that will check through your code and make sure that the types are consistent.
For more complex and extensive examples, you can check xarray
package, which has a very good documentation of its API. The Dataset class documlentation and the associated docstring
Tutorials
Tutorials are a great way to show users how to use your package. They can be written in a Jupyter notebook (.ipynb
files). You can see great exemples of galleries of tutorials: - xarray - geopandas - scikit-learn
Sphinx
documentation
To organize your documentation, build automatically a table of content, the API reference, and the tutorials, you can use Sphinx
. This is not the only tool to generate documentation, but it is one of the most popular. Another popular framework is MkDocs
.
Sphinx
generates static websites (i.e. they are not interactive) from templates. It is highly customizable with extensions and themes and can generate documentation in many formats (HTML, PDF, ePub, etc…). It can also be used to generate documentation for other languages than Python.
You can have a look at the Sphinx themes gallery. The most popular ones are PyData
, Furo
or Read the Docs
.
Syntax
Sphinx
(and Sphinx
extensions) can handle three types of syntax for the documentation: - reStructuredText
(.rst
files): this is the native syntax of Sphinx
that has been used for many years, but has lost some popularity to Markdown
or myST
syntax. - Markdown
(.md
files): this is a very popular syntax for writing documentation (used by jupyter notebooks) as it is simple and easy to read. However, some features of documentation are not handled by Markdown
(like cross-references, custom elements, colored call out blocks). You need the extension myst-parser
to use Markdown
syntax in Sphinx
. - myST
(.myst
files): this is a new syntax that is a superset of Markdown
and reStructuredText
. It is more powerful than Markdown
and more readable than reStructuredText
. You need the extension myst-parser
to use myST
syntax in Sphinx
.
Building gallery of tutorials
To build a gallery of tutorials, you can use either the Sphinx Gallery
or the nbsphinx
extensions. The Sphinx Gallery
is more powerful and can generate the gallery from .py
files, while nbsphinx
is more simple and can generate the gallery from .ipynb
files.