Data registry

Friendly data adds functionality specific to the energy modelling community via the registry. It is a collection of metadata (column schemas) for commonly used data columns. Each column must have a generic name, and a data type (a complete list of all supported data types can found in the frictionless documentation). Additionally the registry also records metadata like constraints. Some constraints, like enum, where you limit the allowed values in a column to a valid set, depends on the dataset. These are mentioned in the registry, but the value is left blank, and is determined from the dataset at runtime.

Contributing to the registry

The column registry is designed to evolve as per the needs of the community. If you feel it needs to include new columns to express your dataset/model outputs better, please suggest additions by opening an issue on GitHub. Here we will go over some concepts to make contributing to the registry easier.

Since columns in tabular datasets can be classified as index columns and value columns, the registry also respects these two distinctions. Index columns have values that can identify a row uniquely in a dataset (like a unique ID), whereas value columns simply contain the data in question. A dataset may contain multiple index and value columns.

Any new column suggestion should be in the YAML format, and should match the following structure:

name: <column_name>
type: <type>
constraints:  # optional
  ...

description: >-
  Free text description of the column.  This can include
  restructured text syntax for simple formatting.  This text
  will be included in the online documentation.

The name and type properties are mandatory, the others are optional. However, it is highly recommended that you also include a concise but complete description.

If you want to specify constraints, you can find a complete list of all supported properties on the frictionless documentation page.

Where to add the new column?

As discussed above, there are two kinds of columns, and in the repository they are separated into two folders. Index column definitions should be included in the idxcols folder, and value column definitions should be in the cols folder.

Go to GitHub and open an issue. Notice that there is an issue template with a summary of the above information. Once an issue has been filed, a pull-request where the YAML file is added to the appropriate folder in the repository has to be opened. This can be done easily by navigating to the desired directory and creating a new file and typing in the contents. Please note the issue number in the pull-request comment field, and link it from the sidebar. The community can review the proposal, and suggest edits once an issue/pull-request has been openned.

Following an agreement, the change can be accepted (or merged), and the maintainers of the friendly data registry can release an update with your contribution!

Defining a custom registry

You can define a custom registry in your config file by adding a registry section. It can have custom registry definitions under the cols and idxcols keys. When using the CLI, you can augment the default registry with your custom registry by passing the config file as an option. An example section could look like this:

registry:
  idxcols:
    - name: enduse
      type: string
      constraints:
        enum:
          - cooling
          - heating
          - hot_water
  cols:
    - name: capacity_factor
      type: number
      constraints:
        maximum: 100

When using the Python API, you can temporarily update the registry by using a context manager. There are many ways to do this:

reading a config file
passing the registry updates as a dictionary,
passing a list of columns as index columns, or value columns.

For all these options, the modifications are merged with the default registry, so it is like an update rather than replace.

from friendly_data.registry import config_ctx, get, getall

with config_ctx(conffile="config.yaml") as _:
    print(get("enduse", "idxcols"))
    print(getall())