Chapter 10 Aggregation

Aggregation is the operation of combining multiple indicators into one value. Many composite indicators have a hierarchical structure, so in practice this often involves multiple aggregations, for example aggregating groups of indicators into aggregate values, then aggregating those values into higher-level aggregates, and so on, until the final index value.

Aggregating should almost always be done on normalised data, unless the indicators are already on very similar scales. Otherwise the relative influence of indicators will be very uneven.

Of course you don’t have to aggregate indicators at all, and you might be content with a scoreboard, or perhaps aggregating into several aggregate values rather than a single index. However, consider that aggregation should not substitute the underlying indicator data, but complement it.

Overall, aggregating indicators is a form of information compression - you are trying to combine many indicator values into one, and inevitably information will be lost³. As long as this is kept in mind, and indicator data is presented and made available along side aggregate values, then aggregate (index) values can complement indicators and be used as a useful tool for summarising the underlying data, and identifying overall trends and patterns.

10.1 Weighting

Many aggregation methods involve some kind of weighting, i.e. coefficients that define the relative weight of the indicators/aggregates in the aggregation. In order to aggregate, weights need to first be specified, but to effectively adjust weights it is necessary to aggregate.

This chicken and egg conundrum is best solved by aggregating initially with a trial set of weights, perhaps equal weights, then seeing the effects of the weighting, and making any weight adjustments necessary. For this reason, weighting is dealt with in the following chapter on Weighting.

10.2 Approaches

10.2.1 Means

The most straightforward and widely-used approach to aggregation is the weighted arithmetic mean. Denoting the indicators as $x_i \in \{x_1, x_2, ... , x_d \}$, a weighted arithmetic mean is calculated as:

\[ y = \frac{1}{\sum_{i=1}^d w_i} \sum_{i=1}^d x_iw_i \]

where the $w_i$ are the weights corresponding to each $x_i$. Here, if the weights are chosen to sum to 1, it will simplify to the weighted sum of the indicators. In any case, the weighted mean is scaled by the sum of the weights, so weights operate relative to each other.

Clearly, if the index has more than two levels, then there will be multiple aggregations. For example, there may be three groups of indicators which give three separate aggregate scores. These aggregate scores would then be fed back into the weighted arithmetic mean above to calculate the overall index.

The arithmetic mean has “perfect compensability”, which means that a high score in one indicator will perfectly compensate a low score in another. In a simple example with two indicators scaled between 0 and 10 and equal weighting, a unit with scores (0, 10) would be given the same score as a unit with scores (5, 5) – both have a score of 5.

An alternative is the weighted geometric mean, which uses the product of the indicators rather than the sum.

\[ y = \left( \prod_{i=1}^d x_i^{w_i} \right)^{1 / \sum_{i=1}^d w_i} \]

This is simply the product of each indicator to the power of its weight, all raised the the power of the inverse of the sum of the weights.

The geometric mean is less compensatory than the arithmetic mean – low values in one indicator only partially substitute high values in others. For this reason, the geometric mean may sometimes be preferred when indicators represent “essentials”. An example might be quality of life: a longer life expectancy perhaps should not compensate severe restrictions on personal freedoms.

A third type of mean, in fact the third of the so-called Pythagorean means is the weighted harmonic mean. This uses the mean of the reciprocals of the indicators:

\[ y = \frac{\sum_{i=1}^d w_i}{\sum_{i=1}^d w_i/x_i} \]

The harmonic mean is the the least compensatory of the the three means, even less so than the geometric mean. It is often used for taking the mean of rates and ratios.

10.2.2 Other methods

The weighted median is also a simple alternative candidate. It is defined by ordering indicator values, then picking the value which has half of the assigned weight above it, and half below it. For ordered indicators $x_1, x_2, ..., x_d$ and corresponding weights $w_1, w_2, ..., w_d$ the weighted median is the indicator value $x_m$ that satisfies:

\[ \sum_{i=1}^{m-1} w_i \leq \frac{1}{2}, \: \: \text{and} \sum_{i=m+1}^{d} w_i \leq \frac{1}{2} \]

The median is known to be robust to outliers, and this may be of interest if the distribution of scores across indicators is skewed.

Another somewhat different approach to aggregation is to use the Copeland method. This approach is based pairwise comparisons between units and proceeds as follows. First, an outranking matrix is constructed, which is a square matrix with $N$ columns and $N$ rows, where $N$ is the number of units.

The element in the $p$th row and $q$th column of the matrix is calculated by summing all the indicator weights where unit $p$ has a higher value in those indicators than unit $q$. Similarly, the cell in the $q$th row and $p$th column (which is the cell opposite on the other side of the diagonal), is calculated as the sum of the weights unit where $q$ has a higher value than unit $p$. If the indicator weights sum to one over all indicators, then these two scores will also sum to 1 by definition. The outranking matrix effectively summarises to what extent each unit scores better or worse than all other units, for all unit pairs.

The Copeland score for each unit is calculated by taking the sum of the row values in the outranking matrix. This can be seen as an average measure of to what extent that unit performs above other units.

Clearly, this can be applied at any level of aggregation and used hierarchically like the other aggregation methods presented here.

In some cases, one unit may score higher than the other in all indicators. This is called a dominance pair, and corresponds to any pair scores equal to one (equivalent to any pair scores equal to zero).

The percentage of dominance pairs is an indication of robustness. Under dominance, there is no way methodological choices (weighting, normalisation, etc.) can affect the relative standing of the pair in the ranking. One will always be ranked higher than the other. The greater the number of dominance (or robust) pairs in a classification, the less sensitive country ranks will be to methodological assumptions. COINr allows to calculate the percentage of dominance pairs with an inbuilt function.

10.3 Aggregation in COINr

Many will be amazed to learn that the function to aggregate in COINr is called aggregate(). First, let’s build the ASEM data set up to the point of normalisation, then aggregate it using the default approaches.

library(COINr6)

# Build ASEM data up to the normalisation step
# Assemble data and metadata into a COIN
ASEM <- assemble(IndData = COINr6::ASEMIndData, IndMeta = COINr6::ASEMIndMeta, AggMeta = COINr6::ASEMAggMeta)
## -----------------
## Denominators detected - stored in .$Input$Denominators
## -----------------
## -----------------
## Indicator codes cross-checked and OK.
## -----------------
## Number of indicators = 49
## Number of units = 51
## Number of aggregation levels = 3 above indicator level.
## -----------------
## Aggregation level 1 with 8 aggregate groups: Physical, ConEcFin, Political, Instit, P2P, Environ, Social, SusEcFin
## Cross-check between metadata and framework = OK.
## Aggregation level 2 with 2 aggregate groups: Conn, Sust
## Cross-check between metadata and framework = OK.
## Aggregation level 3 with 1 aggregate groups: Index
## Cross-check between metadata and framework = OK.
## -----------------
# Denominate as specified in metadata
ASEM <- denominate(ASEM, dset = "Raw")
# Normalise using default method: min-max scaled between 0 and 100.
ASEM <- normalise(ASEM, dset = "Denominated")

# Now aggregate using default method: arithmetic mean, using weights found in the COIN
ASEM <- aggregate(ASEM, dset = "Normalised")

# show last few columns of aggregated data set
ASEM$Data$Aggregated[(ncol(ASEM$Data$Aggregated)-5): ncol(ASEM$Data$Aggregated)]
## # A tibble: 51 x 6
##    Environ Social SusEcFin  Conn  Sust Index
##      <dbl>  <dbl>    <dbl> <dbl> <dbl> <dbl>
##  1    71.0   67.0     65.7  50.4  67.9  59.1
##  2    55.9   86.3     53.5  54.8  65.3  60.0
##  3    56.4   54.0     62.5  42.4  57.6  50.0
##  4    76.0   54.8     55.5  45.9  62.1  54.0
##  5    61.9   59.5     31.1  45.5  50.8  48.2
##  6    53.7   61.1     72.0  48.9  62.3  55.6
##  7    72.5   79.8     62.7  51.0  71.7  61.4
##  8    35.7   68.3     70.1  49.5  58.0  53.8
##  9    52.9   82.1     63.8  48.2  66.3  57.2
## 10    66.5   60.8     54.2  47.0  60.5  53.7
## # ... with 41 more rows

By default, the aggregation function performs the following steps:

Uses the weights that were attached to IndMeta and AggMeta in assemble()
Aggregates hierarchically (with default method of weighted arithmetic mean), following the index structure specified in IndMeta and using the data specified in dset
Creates a new data set .$Data$Aggregated, which consists of the data in dset, plus extra columns with scores for each aggregation group, at each aggregation level.

Like other COINr functions aggregate() has arguments dset which specifies which data set to aggregate, and out2 which specifies whether to output an updated COIN, or a data frame. Unlike other COINr functions however, aggregate() only works on COINs, because it requires a number of inputs, including data, weights, and the index structure.

The data set outputted by aggregate() simply adds the aggregate columns onto the end of the data set that was input. This means that aggregate columns will always appear at the end. To more easily see the results of the index, COINr’s getResults() function rearranges the aggregated data set into a better format.

# generate simple results table, attach to COIN
ASEM <- getResults(ASEM, tab_type = "Summary", out2 = "COIN")
# view the table
ASEM$Results$SummaryScores |>
  head(10)
## NULL

This function, which is explained more in Visualising results, sorts the aggregated data and extracts the most relevant columns. It has options to control sorting and which columns to display.

The types of aggregation supported by aggregate() are specified by agtype from the following options:

agtype = arith_mean uses the arithmetic mean, and is the default.
agtype = geom_mean uses the geometric mean. This only works if all data values are positive, otherwise it will throw an error.
agtype = harm_mean uses the harmonic mean. This only works if all data values are non-zero, otherwise it will throw an error.
agtype = median uses the weighted median
agtype = copeland uses the Copeland method. This may take a few seconds to process depending on the number of units, because it involves pairwise comparisons across units.
agtype = mixed a different aggregation method for each level. In this case, aggregation methods are specified as any of the previous options using the agtype_bylevel argument.
agtype = custom allows you to pass a custom aggregation function.

For the last option here, if agtype = custom then you need to also specify the custom function via the agfunc argument. As an example:

# define an aggregation function which is the weighted minimum
weightedMin <- function(x,w) min(x*w, na.rm = TRUE)

# pass to aggregate() and aggregate
ASEM <- aggregate(ASEM, dset = "Normalised", agtype = "custom",
                  agfunc = weightedMin, out2 = "df")

Any custom function should have the form functionName <- function(x,w), i.e. it accepts x and w as vectors of indicator values and weights respectively, and returns a scalar aggregated value. Ensure that NAs are handled (e.g. set na.rm = T) if your data has missing values, otherwise NAs will be passed through to higher levels.

A number of sophisticated aggregation approaches and linked weighting methods are available in the compind package. These can nicely complement the features in COINr and may be of interest to those looking for more technical approaches to aggregation, such as those based on benefit of the doubt methods.

A further argument, agtype_bylevel, allows specification of different normalisation types at different aggregation levels. For example, agtype_bylevel = c("arith_mean", "geom_mean", "median") would use the arithmetic mean at the indicator level, the geometric mean at the pillar level, and the median at the sub-index level (for the ASEM data structure).

Alternative weights can also be used for aggregation by specifying the agweights argument. This can either be:

NULL, in which case it will use the weights that were attached to IndMeta and AggMeta in GII_assemble (if they exist), or
A character string which corresponds to a named list of weights stored in .$Parameters$Weights. You can either add these manually or through rew8r (see chapter on Weighting). E.g. entering agweights = "Original" will use the original weights read in on assembly. This is equivalent to agweights = NULL.
A data frame of weights to use in the aggregation. See Weighting for the format of weight data frames.

A final option of aggregate() is the possibility to set missing data limits for aggregation. By default, aggregate() will produce an aggregate score as long as at least one value is available within the aggregation group. For example, for an aggregation group we might have five indicators. A unit might have the following:

##   Ind1 Ind2 Ind3 Ind4 Ind5
## 1   NA    3   NA   NA   NA

The arithmetic mean of this would still be 3. But the unit has 80% missing data for this aggregation group, so calculating an aggregate score may seem a bit disingenuous. The avail_limit argument helps here: it gives the possibility to set a minimum data availability threshold for each level of the index, such that if data availability is below the threshold (for a particular unit and aggregation group), it returns NA (for that unit and group) instead of an aggregated score. This is probably good practice in any composite indicator where data availability is an issue: some units/countries may have fairly high data availability overall, but have low data availability for particular groups.

Now that the data has been aggregated, a natural next step is to explore the results. This is dealt with in the chapter on [Results visualisation].

This recent paper may be of interest: https://doi.org/10.1016/j.envsoft.2021.105208 ↩︎