Chapter 18 Appendix: Building a Composite Indicator Example

Here we go through the main steps to building a composite indicator in COINr. This is effectively a condensed version of some earlier chapters in this book.

18.1 Before COINr (BC)

Before even getting to COINr you will have to take a number of steps. The steps can be summarised by e.g. the European Commission’s Ten Step Guide. In short, before getting to COINr you should have:

  1. Tried to summarise all elements of the concept you are trying to capture, including by e.g.
    • A literature review
    • Talking to experts and stakeholders if possible
    • Reviewing any other indicator frameworks on the topic
  2. Constructed a “conceptual framework” using the information in 1. For a composite indicator, this should be a hierarchical structure.
  3. Gathered indicator data (these need not be the final set of indicators)

Depending on how thorough you are, and how accessible your data is, this process can take a surprisingly long time. On these topics, there is a lot of good training material available through the European Commission’s Competence Centre for Composite Indicators and Scoreboards here.

18.2 Load data and assemble

Having got a set of preliminary indicator data, the next step is to load data into R in some way. Remember that to build a composite indicator with COINr you need to first build a COIN. To build a COIN you need three data frames, as explained in detail in COINs: the currency of COINr:

  • Indicator data
  • Indicator metadata
  • Aggregation metadata

Where these data frames come from is up to you. You might want to load data into R and then assemble them using your own script. Alternatively, you may wish to assemble these in Excel, for example, then read the sheets of the spreadsheet into R.

Let’s assume now that you have got these three data frames in the correct formats. Now you simply put them into assemble() to build your COIN. Here I will use the built-in data frames in COINr:

library(COINr6)

ASEM <- assemble(IndData = ASEMIndData,
                 IndMeta = ASEMIndMeta,
                 AggMeta = ASEMAggMeta)
## -----------------
## Denominators detected - stored in .$Input$Denominators
## -----------------
## -----------------
## Indicator codes cross-checked and OK.
## -----------------
## Number of indicators = 49
## Number of units = 51
## Number of aggregation levels = 3 above indicator level.
## -----------------
## Aggregation level 1 with 8 aggregate groups: Physical, ConEcFin, Political, Instit, P2P, Environ, Social, SusEcFin
## Cross-check between metadata and framework = OK.
## Aggregation level 2 with 2 aggregate groups: Conn, Sust
## Cross-check between metadata and framework = OK.
## Aggregation level 3 with 1 aggregate groups: Index
## Cross-check between metadata and framework = OK.
## -----------------

18.3 Check the data

At this point, it’s worth checking the COIN to make sure that everything is as you would expect. One thing is to check the framework:

plotframework(ASEM)

You can also explore the indicator data using COINr’s indDash() app:

# not run here because requires an active R session
indDash(ASEM)

I would also recommend to check the statistics of each indicator using getStats(). Look for any indicators that are highly skewed, have few unique values, have low data availability, or strong correlations with other indicators or with denominators.

# load reactable for viewing table (in R you can do this instead with R Studio)
library(reactable)
library(magrittr)

# get stats
ASEM <- getStats(ASEM, dset = "Raw")
## Number of collinear indicators =  3
## Number of signficant negative indicator correlations =  322
## Number of indicators with high denominator correlations =  7

# view stats table
ASEM$Analysis$Raw$StatTable %>%
  roundDF() %>%
  reactable()

At this point you may decide to check individual indicators, and some may be added or excluded.

18.4 Denomination

If you want to divide indicators by e.g. GDP or population to be able to compare small countries with larger ones, you can use denominate(). The specifications are either made initially in IndMeta, or as arguments to denominate(). In the case of the ASEM data set, these are included in IndMeta so the command is very simple (run View(ASEMIndMeta) to see). We will afterwards check the new stats to see what has changed.

# create denominated data set
ASEM <- denominate(ASEM, dset = "Raw")

# get stats of denominated data
ASEM <- getStats(ASEM, dset = "Denominated")
## Number of collinear indicators =  0
## Number of signficant negative indicator correlations =  440
## Number of indicators with high denominator correlations =  0

# view stats table
ASEM$Analysis$Raw$StatTable %>%
  roundDF() %>%
  reactable()

According to the new table, there are now no high correlations with denominators, which indicates some kind of success.

18.5 Imputation

If you have missing data (and if you don’t, you are fortunate), you can choose to impute the data. If you don’t impute missing data, you can still build a composite indicator though.

We can use one of COINr’s options for imputing data. Like other commands, we operate on the COIN and the output is an updated COIN. Actually at this point I will make two alternatives - one which imputes missing data using a group median, and another which performs no imputation.

# Make a copy
ASEM_noImpute <- ASEM

# impute one
ASEM <- impute(ASEM, dset = "Denominated", imtype = "indgroup_mean", groupvar = "Group_GDP")
## Missing data points detected = 65
## Missing data points imputed = 65, using method = indgroup_mean

18.6 Data treatment

Next we may wish to treat the data for outliers. Here we apply a standard approach which Winsorises each indicator up to a specified limit of points, in order to bring skew and kurtosis below specified thresholds. If Winsorisation fails, it applies a log transformation or similar.

ASEM <- treat(ASEM, dset = "Imputed", winmax = 5)
ASEM_noImpute <- treat(ASEM_noImpute, dset = "Denominated", winmax = 5)

Following treatment, it is a good idea to check which indicators were treated and how:

library(dplyr)

ASEM$Analysis$Treated$TreatSummary %>%
  filter(Treatment != "None")
##       IndCode Low High           TreatSpec           Treatment
## V2   Services   0    4 Default, winmax = 5 Winsorised 4 points
## V3        FDI   0    2 Default, winmax = 5 Winsorised 2 points
## V5    ForPort   0    5 Default, winmax = 5 Winsorised 3 points
## V6  CostImpEx   0    1 Default, winmax = 5 Winsorised 1 points
## V7     Tariff   0    4 Default, winmax = 5 Winsorised 4 points
## V12     StMob   0    2 Default, winmax = 5 Winsorised 2 points
## V15  CultServ   0    4 Default, winmax = 5 Winsorised 3 points
## V21   Flights   0    1 Default, winmax = 5 Winsorised 1 points
## V23      Bord   0    3 Default, winmax = 5 Winsorised 3 points
## V25       Gas   0    3 Default, winmax = 5 Winsorised 3 points
## V35    Forest   0    2 Default, winmax = 5 Winsorised 2 points
## V36   Poverty   0    3 Default, winmax = 5 Winsorised 3 points
## V41      NGOs   0    3 Default, winmax = 5 Winsorised 3 points
## V43    FemLab   1    0 Default, winmax = 5 Winsorised 1 points
## V45   PubDebt   0    1 Default, winmax = 5 Winsorised 1 points

It is also a good idea to visualise and compare the treated data against the untreated data. The best way to do this interactively is to call indDash() again, which allows comparison of treated and untreated indicators side by side. We can also do this manually (or for presentation) for specific indicators:

iplotIndDist2(ASEM, dsets = c("Imputed", "Treated"), icodes = "Services", ptype = "Scatter")

This shows the Winsorisation of four points for the “Services” indicator. This could also be plotted in different ways using box plots or violin plots.

18.7 Normalisation

The next step would be to normalise the data. In the ASEM index we will use a simple min-max normalisation in the \([0, 100]\) interval.

ASEM <- normalise(ASEM, dset = "Treated", ntype = "minmax", npara = list(minmax = c(0,100)))
ASEM_noImpute <- normalise(ASEM_noImpute, dset = "Treated", ntype = "minmax", npara = list(minmax = c(0,100)))

Again, we could visualise and check stats here but to keep things shorter we’ll skip that for now.

18.8 Aggregation

The last construction step (apart from iterative changes) is to aggregate. Again we use a simple arithmetic mean. The structure of the index is stored in the IndMeta data frame inside the COIN, so we only need to specify which data set to aggregate, and which method to use.

ASEM <- aggregate(ASEM, agtype = "arith_mean", dset = "Normalised")
ASEM_noImpute <- aggregate(ASEM_noImpute, agtype = "arith_mean", dset = "Normalised")

18.9 Visualisation

We can now visualise our results. A good way at the index level is a stacked bar chart.

iplotBar(ASEM, dset = "Aggregated", isel = "Index", aglev = 4, stack_children = T)

This can look a bit strange because in fact the ASEM index was only aggregated up to the sustainability and connectivity sub-indexes. We can also plot that:

iplotIndDist2(ASEM, dsets = "Aggregated", icodes = "Index")

We may also plot the results on a map. Here we’ll only plot connectivity:

iplotMap(ASEM, dset = "Aggregated", isel = "Conn")

For a bit more detail we can also generate a table.

getResults(ASEM, tab_type = "Summary") %>%
  knitr::kable()
UnitCode UnitName Index Rank
CHE Switzerland 67.70 1
DNK Denmark 64.15 2
NLD Netherlands 63.70 3
NOR Norway 63.66 4
BEL Belgium 62.88 5
SWE Sweden 62.43 6
LUX Luxembourg 61.51 7
AUT Austria 61.47 8
DEU Germany 60.36 9
MLT Malta 60.14 10
SVN Slovenia 60.04 11
IRL Ireland 60.02 12
SGP Singapore 59.00 13
FIN Finland 58.17 14
GBR United Kingdom 57.15 15
LTU Lithuania 56.87 16
CZE Czech Republic 56.47 17
HRV Croatia 55.51 18
HUN Hungary 55.22 19
EST Estonia 55.11 20
FRA France 54.82 21
LVA Latvia 54.72 22
SVK Slovakia 54.67 23
ROU Romania 54.46 24
ESP Spain 54.04 25
POL Poland 53.87 26
PRT Portugal 53.03 27
ITA Italy 51.38 28
BGR Bulgaria 50.68 29
CYP Cyprus 50.63 30
KOR Korea 50.20 31
NZL New Zealand 48.94 32
GRC Greece 48.64 33
JPN Japan 47.24 34
KHM Cambodia 45.63 35
BRN Brunei Darussalam 45.56 36
AUS Australia 45.42 37
VNM Vietnam 43.55 38
LAO Lao PDR 42.52 39
MYS Malaysia 42.25 40
MMR Myanmar 41.95 41
PHL Philippines 41.33 42
THA Thailand 41.27 43
MNG Mongolia 40.46 44
IDN Indonesia 39.93 45
IND India 39.48 46
KAZ Kazakhstan 38.95 47
BGD Bangladesh 38.92 48
CHN China 37.83 49
PAK Pakistan 37.69 50
RUS Russian Federation 35.45 51

18.10 Comparison

Since we built two slightly different indexes, it makes sense also to check the difference. How do the ranks change if we do or do not impute?

compTable(ASEM, ASEM_noImpute, dset = "Aggregated", isel = "Index")
##    UnitCode           UnitName RankCOIN1 RankCOIN2 RankChange AbsRankChange
## 29      LAO            Lao PDR        39        45         -6             6
## 22      IND              India        46        42          4             4
## 41      PHL        Philippines        42        39          3             3
## 1       AUS          Australia        37        35          2             2
## 6       BRN  Brunei Darussalam        36        38         -2             2
## 16      FRA             France        21        19          2             2
## 27      KHM           Cambodia        35        37         -2             2
## 33      MLT              Malta        10        12         -2             2
## 34      MMR            Myanmar        41        43         -2             2
## 35      MNG           Mongolia        44        46         -2             2
## 50      THA           Thailand        43        41          2             2
## 51      VNM            Vietnam        38        36          2             2
## 2       AUT            Austria         8         7          1             1
## 5       BGR           Bulgaria        29        30         -1             1
## 9       CYP             Cyprus        30        29          1             1
## 12      DNK            Denmark         2         3         -1             1
## 13      ESP              Spain        25        24          1             1
## 14      EST            Estonia        20        21         -1             1
## 18      GRC             Greece        33        32          1             1
## 20      HUN            Hungary        19        20         -1             1
## 21      IDN          Indonesia        45        44          1             1
## 23      IRL            Ireland        12        11          1             1
## 31      LUX         Luxembourg         7         8         -1             1
## 32      LVA             Latvia        22        23         -1             1
## 37      NLD        Netherlands         3         2          1             1
## 39      NZL        New Zealand        32        33         -1             1
## 44      ROU            Romania        24        25         -1             1
## 47      SVK           Slovakia        23        22          1             1
## 48      SVN           Slovenia        11        10          1             1
## 3       BEL            Belgium         5         5          0             0
## 4       BGD         Bangladesh        48        48          0             0
## 7       CHE        Switzerland         1         1          0             0
## 8       CHN              China        49        49          0             0
## 10      CZE     Czech Republic        17        17          0             0
## 11      DEU            Germany         9         9          0             0
## 15      FIN            Finland        14        14          0             0
## 17      GBR     United Kingdom        15        15          0             0
## 19      HRV            Croatia        18        18          0             0
## 24      ITA              Italy        28        28          0             0
## 25      JPN              Japan        34        34          0             0
## 26      KAZ         Kazakhstan        47        47          0             0
## 28      KOR              Korea        31        31          0             0
## 30      LTU          Lithuania        16        16          0             0
## 36      MYS           Malaysia        40        40          0             0
## 38      NOR             Norway         4         4          0             0
## 40      PAK           Pakistan        50        50          0             0
## 42      POL             Poland        26        26          0             0
## 43      PRT           Portugal        27        27          0             0
## 45      RUS Russian Federation        51        51          0             0
## 46      SGP          Singapore        13        13          0             0
## 49      SWE             Sweden         6         6          0             0

This shows that the maximum rank change is 6 places, at the index level.

18.11 Export

Most likely not everyone you present the results to will want to see it in R. COINr has a simple but quick way to export the entire contents of the COIN to Excel (effectively, it exports all data frames that are present). We will first generate a results table of the main results, attach it to the COIN, then export to Excel. Note that the results are anyway present in ASEM$Data$Aggregated, but the getResults() function provides tables that are better to present (the highest levels of aggregation are the first columns, rather than the last, and it is sorted by index score).

# Write full results table to COIN
getResults(ASEM, tab_type = "FullWithDenoms", out2 = "COIN")

# Export entire COIN to Excel
coin2Excel(ASEM, "ASEM_results.xlsx")

18.12 Summary

This is a fast example of COINr functions for building a composite indicator. Normally this would be an iterative process, but it showcases how simple commands can be used to do fairly complex operations in many cases. There is a lot more to all of the functions used here, and you should check the respective chapters of this book to better tune them to your needs.