Data registry
Friendly data adds functionality specific to the energy modelling
community via the registry. It is a collection of metadata (column
schemas) for commonly used data columns. Each column must have a
generic name, and a data type (a complete list of all supported data
types can found in the frictionless documentation). Additionally
the registry also records metadata like constraints
. Some
constraints, like enum
, where you limit the allowed values in a
column to a valid set, depends on the dataset. These are mentioned in
the registry, but the value is left blank, and is determined from the
dataset at runtime.
Contributing to the registry
The column registry is designed to evolve as per the needs of the community. If you feel it needs to include new columns to express your dataset/model outputs better, please suggest additions by opening an issue on GitHub. Here we will go over some concepts to make contributing to the registry easier.
Since columns in tabular datasets can be classified as index columns and value columns, the registry also respects these two distinctions. Index columns have values that can identify a row uniquely in a dataset (like a unique ID), whereas value columns simply contain the data in question. A dataset may contain multiple index and value columns.
Any new column suggestion should be in the YAML format, and should match the following structure:
name: <column_name>
type: <type>
constraints: # optional
...
description: >-
Free text description of the column. This can include
restructured text syntax for simple formatting. This text
will be included in the online documentation.
The name
and type
properties are mandatory, the others are
optional. However, it is highly recommended that you also include a
concise but complete description.
If you want to specify constraints
, you can find a complete list
of all supported properties on the frictionless documentation page.
Where to add the new column?
As discussed above, there are two kinds of columns, and in the repository they are separated into two folders. Index column definitions should be included in the idxcols folder, and value column definitions should be in the cols folder.
Go to GitHub and open an issue. Notice that there is an issue template with a summary of the above information. Once an issue has been filed, a pull-request where the YAML file is added to the appropriate folder in the repository has to be opened. This can be done easily by navigating to the desired directory and creating a new file and typing in the contents. Please note the issue number in the pull-request comment field, and link it from the sidebar. The community can review the proposal, and suggest edits once an issue/pull-request has been openned.
Following an agreement, the change can be accepted (or merged), and the maintainers of the friendly data registry can release an update with your contribution!
Defining a custom registry
You can define a custom registry in your config file by adding a
registry
section. It can have custom registry definitions under
the cols
and idxcols
keys. When using the CLI, you can
augment the default registry with your custom registry by passing the
config file as an option. An example section could look like this:
registry:
idxcols:
- name: enduse
type: string
constraints:
enum:
- cooling
- heating
- hot_water
cols:
- name: capacity_factor
type: number
constraints:
maximum: 100
When using the Python API, you can temporarily update the registry by using a context manager. There are many ways to do this:
reading a config file
passing the registry updates as a dictionary,
passing a list of columns as index columns, or value columns.
For all these options, the modifications are merged with the default registry, so it is like an update rather than replace.
from friendly_data.registry import config_ctx, get, getall
with config_ctx(conffile="config.yaml") as _:
print(get("enduse", "idxcols"))
print(getall())