Documentation

Documentation is an essential part of any software project. It is the way to communicate with potential users and contributors, and to ensure that the project is sustainable in the long term.

R users

DESCRIPTION file

For your entire project, you will need a DESCRIPTION file which gather the project metadata, for instance:

> Package: mypackage
> Title: What the Package Does (One Line, Title Case) \
> Version: 0.0.0.1000 \
> Authors@R:
    person("First", "Last", , "first.last@example.com", role = c("aut", "cre"),
           comment = c(ORCID = "YOUR-ORCID-ID")) \
> Description: What the package does (one paragraph). \
> Imports: Rpackage1, Rpackage2 (the list of R packages that are needed to run your analysis)

Some of these sections may be edited by hand, but others are automatically generated by devtools or usethis packages.

Function documentation: basics

What is needed in the function documentation?

what does your function do
with which arguments
what does it return
(maybe) some examples of how to use it

Here is an example of header for the custom ‘add’ function:

#' Add together two numbers
#'
#' @param x A number.
#' @param y A number.
#' @returns A numeric vector.
#' @examples
#' add(1, 1)
#' add(10, 1)
add <- function(x, y) {
  x + y
}

You can add many options to your documentation, such as:

@export to make the function available to the user
@importFrom to import a function from a package
@seealso to refer to other functions
Write both function and documentation at the same time in my-function.R file, stored in R sub-repository.
Use roxygen to generate man/my-function.Rd, reading the header: the devtools function document()

devtools::document()

will generate (or update) your package’s .Rd files

Package documentation

For a more “integrated” documentation of your package, that details the functions, datasets, and other objects in your package, you can use vignettes that can generate webpages with interactive code, results, plots and comments, and pkgdown to create a website for your package.

Also see CI/CI page to automate vignette and website publishing.

Python users

`README.md` file

This is the main documentation file for your project. It is located at the root of the project and should contain a general description of the project, its purpose, and how to use it. This is the first thing that users will see when they visit your project on GitHub or Gitlab (or wherever you host your code).

Here is a list of things that you should include in your README.md file:

Name of the project / package. Idealy, it should match the name of the repository.
Badges: These are small images that show the status and the quality of your project. It is especialy usefull if you want to distribute your project / package to users. For example, you can add a badge that shows :
- the build status of the project :
- the build of the documentation :
- the version of the package on Pypi: or on Conda:
- and many more…
Description: A short description of the project / package. 1-3 sentences is generaly enough. Just enough to give an idea of what the project is about, and generaly not too technical.
Installation: How to install the package. This should include the command to install the package using pip or conda, and any other dependencies that need to be installed.
Usage: How to use the package. This should include an example of the most basic use case of the package.
Links: Links to the documentation, tutorials, the issue tracker, the source code, the license, etc.
Contributing: How to contribute to the project. This should include information on how to report bugs, how to request new features, and how to submit code changes and how to setup the development environment.
Citation: How to cite the project.

Documentation of API

API stands for Application Programming Interface. It can refer to functions, classes, or modules in your package, that create a user interface to your code. The documentation of the API is essential for users to understand how to use your package.

Docstrings

What is it and how to write it?

In Python, the documentation is written in a docstring: a string that is the first statement in a module, function, class, or method, embedded within """(triple double-quotes). The docstring should describe what the function does, what arguments it takes and their types (i.e. strings, bool, etc…), and what it returns. This docstring is then used by the help() function, and by the pydoc module to generate documentation.

You need to consistently write docstrings for all the functions, classes, and modules in your package.

There are several conventions for writing docstrings in Python. The most common ones are:

Example

Here is an example of a function with a docstring:

def add(x: int, y: int) -> int:
    """
    Add together two numbers.

    Parameters
    ----------
    x : int
        A number.
    y : int
        A number.

    Returns
    -------
    int
        A numeric vector.

    Examples
    --------
    >>> add(1, 1)
    2
    >>> add(10, 1)
    11
    """
    return x + y

This simple function simply adds two numbers together. The docstring provides: - a description of what the function does - the inputs / parameters of the function and their types. - the output of the function and its type - a simple example of how to use the function. Note that theses exemples can be executed using the doctest module, hence providing another nice way to test the function. The lines that need to be executed are preceded by >>>.

Note

You can see in the definition of the function that the arguments have “type hints” (i.e. x: int). This is not mandatory, but it is a good practice to add type hints to your functions, as it adds another layer of documentation and it makes the code more readable and helps catch bugs early. You can further describe the return type of the function using the -> operator (i.e. -> int). The type hints are not enforced by Python, but they can be checked using a static type checker like mypy that will check through your code and make sure that the types are consistent.

For more complex and extensive examples, you can check xarraypackage, which has a very good documentation of its API. The Dataset class documlentation and the associated docstring

Tutorials

Tutorials are a great way to show users how to use your package. They can be written in a Jupyter notebook (.ipynb files). You can see great exemples of galleries of tutorials: - xarray - geopandas - scikit-learn

`Sphinx` documentation

To organize your documentation, build automatically a table of content, the API reference, and the tutorials, you can use Sphinx. This is not the only tool to generate documentation, but it is one of the most popular. Another popular framework is MkDocs.

Sphinx generates static websites (i.e. they are not interactive) from templates. It is highly customizable with extensions and themes and can generate documentation in many formats (HTML, PDF, ePub, etc…). It can also be used to generate documentation for other languages than Python.

You can have a look at the Sphinx themes gallery. The most popular ones are PyData, Furo or Read the Docs.

Syntax

Sphinx (and Sphinx extensions) can handle three types of syntax for the documentation: - reStructuredText (.rst files): this is the native syntax of Sphinx that has been used for many years, but has lost some popularity to Markdown or myST syntax. - Markdown (.md files): this is a very popular syntax for writing documentation (used by jupyter notebooks) as it is simple and easy to read. However, some features of documentation are not handled by Markdown (like cross-references, custom elements, colored call out blocks). You need the extension myst-parser to use Markdown syntax in Sphinx. - myST (.myst files): this is a new syntax that is a superset of Markdown and reStructuredText. It is more powerful than Markdown and more readable than reStructuredText. You need the extension myst-parser to use myST syntax in Sphinx.

Building gallery of tutorials

To build a gallery of tutorials, you can use either the Sphinx Gallery or the nbsphinx extensions. The Sphinx Gallery is more powerful and can generate the gallery from .py files, while nbsphinx is more simple and can generate the gallery from .ipynb files.