The Package Index file

Column values that are unique across a dataset can be used to identify a specific row. These columns are referred to as index columns 1, alternatively they are also referred to as the primary key. Using a package index file we can combine column metadata and specify these index columns, column aliases, etc. The set of keys that can be used in a package index file is documented below.

path (string)

Relative path to a dataset

idxcols (list of strings)

Column names that should be considered part of the index of a dataset (or primary key)

skip (positive integer)

Number of lines to skip when reading the dataset

name (string)

Typically the name of a dataset is derived from its file name, but when working with the Python API, this key is used to map a dataframe to an entry in the index file. This can also be used to map a table in a database to an entry (where the path key points to the database, e.g. path to an sqlite file).

alias (mapping or dictionary)

A mapping of column names in the dataset that should be mapped to another column in the registry; say you use node for locations, and you want the corresponding column to be mapped to region in the registry. This can be specified with an index entry like this:
- path: demand.csv
  idxcols: [node, timestep]
  alias: {node: region}

iamc (string)

A format string to construct the IAMC variable for a file entry. It can reference index columns by enclosing them in braces (like a Python format string):
Installed Capacity|{carrier}|{technology}

agg (mapping or dictionary)

A mapping of index column name to a list of aggregation rules (for IAMC conversion) which is another mapping of the form:
values:
- open_field_pv
- roof_mounted_pv
variable: Primary Energy|Solar
As there can be multiple rules for a column, they are included as a list. A complete index entry with aggregation rules looks like:
- agg:
  technology:
  - values:
    - dac
    variable: Carbon Sequestration|Direct Air Capture
  - values:
    - hydro_reservoir
    - hydro_run_of_river
    variable: Primary Energy|Hydro
  - values:
    - open_field_pv
    - roof_mounted_pv
    variable: Primary Energy|Solar
iamc: Primary Energy|{technology}
idxcols:
- carrier
- technology
- year
path: flow_out_sum.csv
With the above entry, when converting to IAMC format, all data points with technology open_field_pv and roof_mounted_pv will be added together under the IAMC variable name Primary Energy|Solar. Note that multiple index columns cannot be combined in this manner; only one is possible.

1: It is similar to index of a book, which allows you to jump to a specific page in the book by looking up a keyword.