Chapter 18 Appendix: Building a Composite Indicator Example

Here we go through the main steps to building a composite indicator in COINr. This is effectively a condensed version of some earlier chapters in this book.

18.1 Before COINr (BC)

Before even getting to COINr you will have to take a number of steps. The steps can be summarised by e.g. the European Commission’s Ten Step Guide. In short, before getting to COINr you should have:

Tried to summarise all elements of the concept you are trying to capture, including by e.g.
- A literature review
- Talking to experts and stakeholders if possible
- Reviewing any other indicator frameworks on the topic
Constructed a “conceptual framework” using the information in 1. For a composite indicator, this should be a hierarchical structure.
Gathered indicator data (these need not be the final set of indicators)

Depending on how thorough you are, and how accessible your data is, this process can take a surprisingly long time. On these topics, there is a lot of good training material available through the European Commission’s Competence Centre for Composite Indicators and Scoreboards here.

18.2 Load data and assemble

Having got a set of preliminary indicator data, the next step is to load data into R in some way. Remember that to build a composite indicator with COINr you need to first build a COIN. To build a COIN you need three data frames, as explained in detail in COINs: the currency of COINr:

Indicator data
Indicator metadata
Aggregation metadata

Where these data frames come from is up to you. You might want to load data into R and then assemble them using your own script. Alternatively, you may wish to assemble these in Excel, for example, then read the sheets of the spreadsheet into R.

Let’s assume now that you have got these three data frames in the correct formats. Now you simply put them into assemble() to build your COIN. Here I will use the built-in data frames in COINr:

library(COINr6)

ASEM <- assemble(IndData = ASEMIndData,
                 IndMeta = ASEMIndMeta,
                 AggMeta = ASEMAggMeta)
## -----------------
## Denominators detected - stored in .$Input$Denominators
## -----------------
## -----------------
## Indicator codes cross-checked and OK.
## -----------------
## Number of indicators = 49
## Number of units = 51
## Number of aggregation levels = 3 above indicator level.
## -----------------
## Aggregation level 1 with 8 aggregate groups: Physical, ConEcFin, Political, Instit, P2P, Environ, Social, SusEcFin
## Cross-check between metadata and framework = OK.
## Aggregation level 2 with 2 aggregate groups: Conn, Sust
## Cross-check between metadata and framework = OK.
## Aggregation level 3 with 1 aggregate groups: Index
## Cross-check between metadata and framework = OK.
## -----------------

18.3 Check the data

At this point, it’s worth checking the COIN to make sure that everything is as you would expect. One thing is to check the framework:

plotframework(ASEM)

You can also explore the indicator data using COINr’s indDash() app:

# not run here because requires an active R session
indDash(ASEM)

I would also recommend to check the statistics of each indicator using getStats(). Look for any indicators that are highly skewed, have few unique values, have low data availability, or strong correlations with other indicators or with denominators.

# load reactable for viewing table (in R you can do this instead with R Studio)
library(reactable)
library(magrittr)

# get stats
ASEM <- getStats(ASEM, dset = "Raw")
## Number of collinear indicators =  3
## Number of signficant negative indicator correlations =  322
## Number of indicators with high denominator correlations =  7

# view stats table
ASEM$Analysis$Raw$StatTable %>%
  roundDF() %>%
  reactable()

At this point you may decide to check individual indicators, and some may be added or excluded.

18.4 Denomination

If you want to divide indicators by e.g. GDP or population to be able to compare small countries with larger ones, you can use denominate(). The specifications are either made initially in IndMeta, or as arguments to denominate(). In the case of the ASEM data set, these are included in IndMeta so the command is very simple (run View(ASEMIndMeta) to see). We will afterwards check the new stats to see what has changed.

# create denominated data set
ASEM <- denominate(ASEM, dset = "Raw")

# get stats of denominated data
ASEM <- getStats(ASEM, dset = "Denominated")
## Number of collinear indicators =  0
## Number of signficant negative indicator correlations =  440
## Number of indicators with high denominator correlations =  0

# view stats table
ASEM$Analysis$Raw$StatTable %>%
  roundDF() %>%
  reactable()

According to the new table, there are now no high correlations with denominators, which indicates some kind of success.

18.5 Imputation

If you have missing data (and if you don’t, you are fortunate), you can choose to impute the data. If you don’t impute missing data, you can still build a composite indicator though.

We can use one of COINr’s options for imputing data. Like other commands, we operate on the COIN and the output is an updated COIN. Actually at this point I will make two alternatives - one which imputes missing data using a group median, and another which performs no imputation.

# Make a copy
ASEM_noImpute <- ASEM

# impute one
ASEM <- impute(ASEM, dset = "Denominated", imtype = "indgroup_mean", groupvar = "Group_GDP")
## Missing data points detected = 65
## Missing data points imputed = 65, using method = indgroup_mean

18.6 Data treatment

Next we may wish to treat the data for outliers. Here we apply a standard approach which Winsorises each indicator up to a specified limit of points, in order to bring skew and kurtosis below specified thresholds. If Winsorisation fails, it applies a log transformation or similar.

ASEM <- treat(ASEM, dset = "Imputed", winmax = 5)
ASEM_noImpute <- treat(ASEM_noImpute, dset = "Denominated", winmax = 5)

Following treatment, it is a good idea to check which indicators were treated and how:

library(dplyr)

ASEM$Analysis$Treated$TreatSummary %>%
  filter(Treatment != "None")
##       IndCode Low High           TreatSpec           Treatment
## V2   Services   0    4 Default, winmax = 5 Winsorised 4 points
## V3        FDI   0    2 Default, winmax = 5 Winsorised 2 points
## V5    ForPort   0    5 Default, winmax = 5 Winsorised 3 points
## V6  CostImpEx   0    1 Default, winmax = 5 Winsorised 1 points
## V7     Tariff   0    4 Default, winmax = 5 Winsorised 4 points
## V12     StMob   0    2 Default, winmax = 5 Winsorised 2 points
## V15  CultServ   0    4 Default, winmax = 5 Winsorised 3 points
## V21   Flights   0    1 Default, winmax = 5 Winsorised 1 points
## V23      Bord   0    3 Default, winmax = 5 Winsorised 3 points
## V25       Gas   0    3 Default, winmax = 5 Winsorised 3 points
## V35    Forest   0    2 Default, winmax = 5 Winsorised 2 points
## V36   Poverty   0    3 Default, winmax = 5 Winsorised 3 points
## V41      NGOs   0    3 Default, winmax = 5 Winsorised 3 points
## V43    FemLab   1    0 Default, winmax = 5 Winsorised 1 points
## V45   PubDebt   0    1 Default, winmax = 5 Winsorised 1 points

It is also a good idea to visualise and compare the treated data against the untreated data. The best way to do this interactively is to call indDash() again, which allows comparison of treated and untreated indicators side by side. We can also do this manually (or for presentation) for specific indicators:

iplotIndDist2(ASEM, dsets = c("Imputed", "Treated"), icodes = "Services", ptype = "Scatter")

This shows the Winsorisation of four points for the “Services” indicator. This could also be plotted in different ways using box plots or violin plots.

18.7 Normalisation

The next step would be to normalise the data. In the ASEM index we will use a simple min-max normalisation in the $[0, 100]$ interval.

ASEM <- normalise(ASEM, dset = "Treated", ntype = "minmax", npara = list(minmax = c(0,100)))
ASEM_noImpute <- normalise(ASEM_noImpute, dset = "Treated", ntype = "minmax", npara = list(minmax = c(0,100)))

Again, we could visualise and check stats here but to keep things shorter we’ll skip that for now.

18.8 Aggregation

The last construction step (apart from iterative changes) is to aggregate. Again we use a simple arithmetic mean. The structure of the index is stored in the IndMeta data frame inside the COIN, so we only need to specify which data set to aggregate, and which method to use.

ASEM <- aggregate(ASEM, agtype = "arith_mean", dset = "Normalised")
ASEM_noImpute <- aggregate(ASEM_noImpute, agtype = "arith_mean", dset = "Normalised")

18.9 Visualisation

We can now visualise our results. A good way at the index level is a stacked bar chart.

iplotBar(ASEM, dset = "Aggregated", isel = "Index", aglev = 4, stack_children = T)

This can look a bit strange because in fact the ASEM index was only aggregated up to the sustainability and connectivity sub-indexes. We can also plot that:

iplotIndDist2(ASEM, dsets = "Aggregated", icodes = "Index")

We may also plot the results on a map. Here we’ll only plot connectivity:

iplotMap(ASEM, dset = "Aggregated", isel = "Conn")

For a bit more detail we can also generate a table.

getResults(ASEM, tab_type = "Summary") %>%
  knitr::kable()

UnitCode	UnitName	Index	Rank
CHE	Switzerland	67.70	1
DNK	Denmark	64.15	2
NLD	Netherlands	63.70	3
NOR	Norway	63.66	4
BEL	Belgium	62.88	5
SWE	Sweden	62.43	6
LUX	Luxembourg	61.51	7
AUT	Austria	61.47	8
DEU	Germany	60.36	9
MLT	Malta	60.14	10
SVN	Slovenia	60.04	11
IRL	Ireland	60.02	12
SGP	Singapore	59.00	13
FIN	Finland	58.17	14
GBR	United Kingdom	57.15	15
LTU	Lithuania	56.87	16
CZE	Czech Republic	56.47	17
HRV	Croatia	55.51	18
HUN	Hungary	55.22	19
EST	Estonia	55.11	20
FRA	France	54.82	21
LVA	Latvia	54.72	22
SVK	Slovakia	54.67	23
ROU	Romania	54.46	24
ESP	Spain	54.04	25
POL	Poland	53.87	26
PRT	Portugal	53.03	27
ITA	Italy	51.38	28
BGR	Bulgaria	50.68	29
CYP	Cyprus	50.63	30
KOR	Korea	50.20	31
NZL	New Zealand	48.94	32
GRC	Greece	48.64	33
JPN	Japan	47.24	34
KHM	Cambodia	45.63	35
BRN	Brunei Darussalam	45.56	36
AUS	Australia	45.42	37
VNM	Vietnam	43.55	38
LAO	Lao PDR	42.52	39
MYS	Malaysia	42.25	40
MMR	Myanmar	41.95	41
PHL	Philippines	41.33	42
THA	Thailand	41.27	43
MNG	Mongolia	40.46	44
IDN	Indonesia	39.93	45
IND	India	39.48	46
KAZ	Kazakhstan	38.95	47
BGD	Bangladesh	38.92	48
CHN	China	37.83	49
PAK	Pakistan	37.69	50
RUS	Russian Federation	35.45	51

18.10 Comparison

Since we built two slightly different indexes, it makes sense also to check the difference. How do the ranks change if we do or do not impute?

compTable(ASEM, ASEM_noImpute, dset = "Aggregated", isel = "Index")
##    UnitCode           UnitName RankCOIN1 RankCOIN2 RankChange AbsRankChange
## 29      LAO            Lao PDR        39        45         -6             6
## 22      IND              India        46        42          4             4
## 41      PHL        Philippines        42        39          3             3
## 1       AUS          Australia        37        35          2             2
## 6       BRN  Brunei Darussalam        36        38         -2             2
## 16      FRA             France        21        19          2             2
## 27      KHM           Cambodia        35        37         -2             2
## 33      MLT              Malta        10        12         -2             2
## 34      MMR            Myanmar        41        43         -2             2
## 35      MNG           Mongolia        44        46         -2             2
## 50      THA           Thailand        43        41          2             2
## 51      VNM            Vietnam        38        36          2             2
## 2       AUT            Austria         8         7          1             1
## 5       BGR           Bulgaria        29        30         -1             1
## 9       CYP             Cyprus        30        29          1             1
## 12      DNK            Denmark         2         3         -1             1
## 13      ESP              Spain        25        24          1             1
## 14      EST            Estonia        20        21         -1             1
## 18      GRC             Greece        33        32          1             1
## 20      HUN            Hungary        19        20         -1             1
## 21      IDN          Indonesia        45        44          1             1
## 23      IRL            Ireland        12        11          1             1
## 31      LUX         Luxembourg         7         8         -1             1
## 32      LVA             Latvia        22        23         -1             1
## 37      NLD        Netherlands         3         2          1             1
## 39      NZL        New Zealand        32        33         -1             1
## 44      ROU            Romania        24        25         -1             1
## 47      SVK           Slovakia        23        22          1             1
## 48      SVN           Slovenia        11        10          1             1
## 3       BEL            Belgium         5         5          0             0
## 4       BGD         Bangladesh        48        48          0             0
## 7       CHE        Switzerland         1         1          0             0
## 8       CHN              China        49        49          0             0
## 10      CZE     Czech Republic        17        17          0             0
## 11      DEU            Germany         9         9          0             0
## 15      FIN            Finland        14        14          0             0
## 17      GBR     United Kingdom        15        15          0             0
## 19      HRV            Croatia        18        18          0             0
## 24      ITA              Italy        28        28          0             0
## 25      JPN              Japan        34        34          0             0
## 26      KAZ         Kazakhstan        47        47          0             0
## 28      KOR              Korea        31        31          0             0
## 30      LTU          Lithuania        16        16          0             0
## 36      MYS           Malaysia        40        40          0             0
## 38      NOR             Norway         4         4          0             0
## 40      PAK           Pakistan        50        50          0             0
## 42      POL             Poland        26        26          0             0
## 43      PRT           Portugal        27        27          0             0
## 45      RUS Russian Federation        51        51          0             0
## 46      SGP          Singapore        13        13          0             0
## 49      SWE             Sweden         6         6          0             0

This shows that the maximum rank change is 6 places, at the index level.

18.11 Export

Most likely not everyone you present the results to will want to see it in R. COINr has a simple but quick way to export the entire contents of the COIN to Excel (effectively, it exports all data frames that are present). We will first generate a results table of the main results, attach it to the COIN, then export to Excel. Note that the results are anyway present in ASEM$Data$Aggregated, but the getResults() function provides tables that are better to present (the highest levels of aggregation are the first columns, rather than the last, and it is sorted by index score).

# Write full results table to COIN
getResults(ASEM, tab_type = "FullWithDenoms", out2 = "COIN")

# Export entire COIN to Excel
coin2Excel(ASEM, "ASEM_results.xlsx")

18.12 Summary

This is a fast example of COINr functions for building a composite indicator. Normally this would be an iterative process, but it showcases how simple commands can be used to do fairly complex operations in many cases. There is a lot more to all of the functions used here, and you should check the respective chapters of this book to better tune them to your needs.