Converting to IAMC format

The IAMC format from the IAM consortium is popular in the energy modelling community. So Friendly data provides workflows to convert a data package to IAMC output with some configuration.

The IAMC format allows the user to define their own hierarchy of variables. So when using Friendly data, you can associate specific files to different branches of the hierarchy. There are currently three ways of specifying this: 1. use a fixed string, 2. use a format string with one or more user defined index columns, and 3. define a set of values that are combined and mapped to an IAMC variable.

A format string is specified in the index file by adding an iamc key. If the string contains a column name enclosed in braces, when creating the IAMC file, corresponding values from the index column will be substituted in that position.

Let us consider the example data package:

$ tree
.
├── annual_cost_per_nameplate_capacity.csv
├── carrier.csv
├── conf.yaml
├── datapackage.json
├── emissions_per_flow_in.csv
├── flow_out_sum.csv
├── index.yaml
├── LICENSE
├── nameplate_capacity.csv
├── README.md
└── technology.csv

If we consider the dataset flow_out_sum.csv, which looks like:

Energy flow out

scenario

techs

locs

carriers

unit

year

flow_out_sum

diag-npi

wind_offshore

DEU

electricity

twh

2030

0.0026985550026472

diag-npi

wind_offshore

DNK

electricity

twh

2030

0.0014073819977408

diag-npi

wind_onshore

CHE

electricity

twh

2030

0.0007493784045182

diag-npi

wind_onshore

DEU

electricity

twh

2030

0.0258391578821039

diag-npi

nuclear

CHE

electricity

twh

2030

62.78803794129926

diag-npi

nuclear

DEU

electricity

twh

2030

224.96177256922013

The IAMC format requires that the data have the columns: model, scenario, region, variable, unit, and value. If the data is in “long format”, then it should also have a column year. In the above dataset, locs is an alias for region, but there are no columns for model, variable, or value, and there is an additional column called techs.

The corresponding entry in the index file looks something like this:

- agg:
    technology:
    - values:
      - wind_onshore
      - wind_offshore
      variable: Primary Energy|Wind
  alias:
    locs: region
    techs: technology
    carriers: carrier
  iamc: Primary Energy|{technology}
  idxcols:
  - scenario
  - carriers
  - techs
  - locs
  - unit
  - year
  path: flow_out_sum.csv

The alias key declares that, techs is to be treated as technology, and locs as region - that satisfies one of the missing columns required by the IAMC specification. You will also note, there is a iamc key. This mentions technology in {...}. This is a format string, which means all occurences of technology are to be replaced by the corresponding values in data. The agg key also specifies a rule that combines two technologies under a single name. The dataset has wind_onshore, wind_offshore, and nuclear. While wind_* technologies are summed together, nuclear is replaced in the format string to form the IAMC variable. The resulting strings are then available under the variable column. However you will note, the technology names are not particularly descriptive, so you probably want to replace them with something more commonly used in an IAMC dataset. These alternate names can be specified in a separate CSV file, and provided in the configuration file. If we refer to the example data package, we will find a conf.yaml file, which has a section like this:

indices:
  technology: technology.csv
  carrier: carrier.csv
  model: calliope

The above configures technology names to be resolved as per technology.csv, which looks like this:

Technology definitions

name

iamc

nuclear

Nuclear

wind_offshore

Wind|Offshore

wind_onshore

Wind|Onshore

In the same configuration snippet, you can see there’s a key for model, but instead of pointing to a file like technology, it specifies a string. If a model column does not exist in your dataset, this string will be taken as the default value for such a column. This leaves only the value column, which is nothing but the data column, in our example that is flow_out_sum. And we have our data in IAMC format!

Data in IAMC format

model

scenario

region

variable

unit

year

value

calliope

diag-npi

CHE

Fixed Cost|Electricity|Wind

billion_2015eur_per_tw_per_year

2030

47.7515

calliope

diag-npi

DEU

Fixed Cost|Electricity|Wind

billion_2015eur_per_tw_per_year

2030

47.75149999999999

calliope

diag-npi

CHE

Fixed Cost|Electricity|Nuclear

billion_2015eur_per_tw_per_year

2030

76.116

calliope

diag-npi

DEU

Fixed Cost|Electricity|Nuclear

billion_2015eur_per_tw_per_year

2030

76.116

This kind of replacement from values in the dataset can de done with multiple columns, e.g. the index entry for nameplate_capacity.csv looks like this:

- agg:
    technology:
    - values:
      - wind_onshore
      - wind_offshore
      variable: Capacity|Electricity|Wind
  alias:
    locs: region
    techs: technology
    carriers: carrier
  iamc: Capacity|{carrier}|{technology}
  idxcols:
  - scenario
  - carriers
  - techs
  - locs
  - unit
  - year
  path: nameplate_capacity.csv

Here, all possible combinations of technology and carrier will be tried, and only the ones present in the data will be included in the final output. If you do not need replacement from data, you can always use a regular string (without any {...}) to denote what should be in the variable column (see the example data package for other examples).