The Package Index file
Column values that are unique across a dataset can be used to identify a specific row. These columns are referred to as index columns 1, alternatively they are also referred to as the primary key. Using a package index file we can combine column metadata and specify these index columns, column aliases, etc. The set of keys that can be used in a package index file is documented below.
path (string)
Relative path to a dataset
idxcols (list of strings)
Column names that should be considered part of the index of a dataset (or primary key)
skip (positive integer)
Number of lines to skip when reading the dataset
name (string)
Typically the name of a dataset is derived from its file name, but when working with the Python API, this key is used to map a dataframe to an entry in the index file. This can also be used to map a table in a database to an entry (where the
path
key points to the database, e.g. path to an sqlite file).
alias (mapping or dictionary)
A mapping of column names in the dataset that should be mapped to another column in the registry; say you use
node
for locations, and you want the corresponding column to be mapped toregion
in the registry. This can be specified with an index entry like this:- path: demand.csv idxcols: [node, timestep] alias: {node: region}
iamc (string)
A format string to construct the IAMC variable for a file entry. It can reference index columns by enclosing them in braces (like a Python format string):
Installed Capacity|{carrier}|{technology}
agg (mapping or dictionary)
A mapping of index column name to a list of aggregation rules (for IAMC conversion) which is another mapping of the form:
values: - open_field_pv - roof_mounted_pv variable: Primary Energy|SolarAs there can be multiple rules for a column, they are included as a list. A complete index entry with aggregation rules looks like:
- agg: technology: - values: - dac variable: Carbon Sequestration|Direct Air Capture - values: - hydro_reservoir - hydro_run_of_river variable: Primary Energy|Hydro - values: - open_field_pv - roof_mounted_pv variable: Primary Energy|Solar iamc: Primary Energy|{technology} idxcols: - carrier - technology - year path: flow_out_sum.csvWith the above entry, when converting to IAMC format, all data points with technology
open_field_pv
androof_mounted_pv
will be added together under the IAMC variable namePrimary Energy|Solar
. Note that multiple index columns cannot be combined in this manner; only one is possible.
- 1
It is similar to index of a book, which allows you to jump to a specific page in the book by looking up a keyword.