Chapter 18 Appendix: Building a Composite Indicator Example
Here we go through the main steps to building a composite indicator in COINr. This is effectively a condensed version of some earlier chapters in this book.
18.1 Before COINr (BC)
Before even getting to COINr you will have to take a number of steps. The steps can be summarised by e.g. the European Commission’s Ten Step Guide. In short, before getting to COINr you should have:
- Tried to summarise all elements of the concept you are trying to capture, including by e.g.
- A literature review
- Talking to experts and stakeholders if possible
- Reviewing any other indicator frameworks on the topic
- Constructed a “conceptual framework” using the information in 1. For a composite indicator, this should be a hierarchical structure.
- Gathered indicator data (these need not be the final set of indicators)
Depending on how thorough you are, and how accessible your data is, this process can take a surprisingly long time. On these topics, there is a lot of good training material available through the European Commission’s Competence Centre for Composite Indicators and Scoreboards here.
18.2 Load data and assemble
Having got a set of preliminary indicator data, the next step is to load data into R in some way. Remember that to build a composite indicator with COINr you need to first build a COIN. To build a COIN you need three data frames, as explained in detail in COINs: the currency of COINr:
- Indicator data
- Indicator metadata
- Aggregation metadata
Where these data frames come from is up to you. You might want to load data into R and then assemble them using your own script. Alternatively, you may wish to assemble these in Excel, for example, then read the sheets of the spreadsheet into R.
Let’s assume now that you have got these three data frames in the correct formats. Now you simply put them into assemble()
to build your COIN. Here I will use the built-in data frames in COINr:
library(COINr6)
<- assemble(IndData = ASEMIndData,
ASEM IndMeta = ASEMIndMeta,
AggMeta = ASEMAggMeta)
## -----------------
## Denominators detected - stored in .$Input$Denominators
## -----------------
## -----------------
## Indicator codes cross-checked and OK.
## -----------------
## Number of indicators = 49
## Number of units = 51
## Number of aggregation levels = 3 above indicator level.
## -----------------
## Aggregation level 1 with 8 aggregate groups: Physical, ConEcFin, Political, Instit, P2P, Environ, Social, SusEcFin
## Cross-check between metadata and framework = OK.
## Aggregation level 2 with 2 aggregate groups: Conn, Sust
## Cross-check between metadata and framework = OK.
## Aggregation level 3 with 1 aggregate groups: Index
## Cross-check between metadata and framework = OK.
## -----------------
18.3 Check the data
At this point, it’s worth checking the COIN to make sure that everything is as you would expect. One thing is to check the framework:
plotframework(ASEM)
You can also explore the indicator data using COINr’s indDash()
app:
# not run here because requires an active R session
indDash(ASEM)
I would also recommend to check the statistics of each indicator using getStats()
. Look for any indicators that are highly skewed, have few unique values, have low data availability, or strong correlations with other indicators or with denominators.
# load reactable for viewing table (in R you can do this instead with R Studio)
library(reactable)
library(magrittr)
# get stats
<- getStats(ASEM, dset = "Raw")
ASEM ## Number of collinear indicators = 3
## Number of signficant negative indicator correlations = 322
## Number of indicators with high denominator correlations = 7
# view stats table
$Analysis$Raw$StatTable %>%
ASEMroundDF() %>%
reactable()
At this point you may decide to check individual indicators, and some may be added or excluded.
18.4 Denomination
If you want to divide indicators by e.g. GDP or population to be able to compare small countries with larger ones, you can use denominate()
. The specifications are either made initially in IndMeta
, or as arguments to denominate()
. In the case of the ASEM data set, these are included in IndMeta
so the command is very simple (run View(ASEMIndMeta)
to see). We will afterwards check the new stats to see what has changed.
# create denominated data set
<- denominate(ASEM, dset = "Raw")
ASEM
# get stats of denominated data
<- getStats(ASEM, dset = "Denominated")
ASEM ## Number of collinear indicators = 0
## Number of signficant negative indicator correlations = 440
## Number of indicators with high denominator correlations = 0
# view stats table
$Analysis$Raw$StatTable %>%
ASEMroundDF() %>%
reactable()
According to the new table, there are now no high correlations with denominators, which indicates some kind of success.
18.5 Imputation
If you have missing data (and if you don’t, you are fortunate), you can choose to impute the data. If you don’t impute missing data, you can still build a composite indicator though.
We can use one of COINr’s options for imputing data. Like other commands, we operate on the COIN and the output is an updated COIN. Actually at this point I will make two alternatives - one which imputes missing data using a group median, and another which performs no imputation.
# Make a copy
<- ASEM
ASEM_noImpute
# impute one
<- impute(ASEM, dset = "Denominated", imtype = "indgroup_mean", groupvar = "Group_GDP")
ASEM ## Missing data points detected = 65
## Missing data points imputed = 65, using method = indgroup_mean
18.6 Data treatment
Next we may wish to treat the data for outliers. Here we apply a standard approach which Winsorises each indicator up to a specified limit of points, in order to bring skew and kurtosis below specified thresholds. If Winsorisation fails, it applies a log transformation or similar.
<- treat(ASEM, dset = "Imputed", winmax = 5)
ASEM <- treat(ASEM_noImpute, dset = "Denominated", winmax = 5) ASEM_noImpute
Following treatment, it is a good idea to check which indicators were treated and how:
library(dplyr)
$Analysis$Treated$TreatSummary %>%
ASEMfilter(Treatment != "None")
## IndCode Low High TreatSpec Treatment
## V2 Services 0 4 Default, winmax = 5 Winsorised 4 points
## V3 FDI 0 2 Default, winmax = 5 Winsorised 2 points
## V5 ForPort 0 5 Default, winmax = 5 Winsorised 3 points
## V6 CostImpEx 0 1 Default, winmax = 5 Winsorised 1 points
## V7 Tariff 0 4 Default, winmax = 5 Winsorised 4 points
## V12 StMob 0 2 Default, winmax = 5 Winsorised 2 points
## V15 CultServ 0 4 Default, winmax = 5 Winsorised 3 points
## V21 Flights 0 1 Default, winmax = 5 Winsorised 1 points
## V23 Bord 0 3 Default, winmax = 5 Winsorised 3 points
## V25 Gas 0 3 Default, winmax = 5 Winsorised 3 points
## V35 Forest 0 2 Default, winmax = 5 Winsorised 2 points
## V36 Poverty 0 3 Default, winmax = 5 Winsorised 3 points
## V41 NGOs 0 3 Default, winmax = 5 Winsorised 3 points
## V43 FemLab 1 0 Default, winmax = 5 Winsorised 1 points
## V45 PubDebt 0 1 Default, winmax = 5 Winsorised 1 points
It is also a good idea to visualise and compare the treated data against the untreated data. The best way to do this interactively is to call indDash()
again, which allows comparison of treated and untreated indicators side by side. We can also do this manually (or for presentation) for specific indicators:
iplotIndDist2(ASEM, dsets = c("Imputed", "Treated"), icodes = "Services", ptype = "Scatter")
This shows the Winsorisation of four points for the “Services” indicator. This could also be plotted in different ways using box plots or violin plots.
18.7 Normalisation
The next step would be to normalise the data. In the ASEM index we will use a simple min-max normalisation in the \([0, 100]\) interval.
<- normalise(ASEM, dset = "Treated", ntype = "minmax", npara = list(minmax = c(0,100)))
ASEM <- normalise(ASEM_noImpute, dset = "Treated", ntype = "minmax", npara = list(minmax = c(0,100))) ASEM_noImpute
Again, we could visualise and check stats here but to keep things shorter we’ll skip that for now.
18.8 Aggregation
The last construction step (apart from iterative changes) is to aggregate. Again we use a simple arithmetic mean. The structure of the index is stored in the IndMeta
data frame inside the COIN, so we only need to specify which data set to aggregate, and which method to use.
<- aggregate(ASEM, agtype = "arith_mean", dset = "Normalised")
ASEM <- aggregate(ASEM_noImpute, agtype = "arith_mean", dset = "Normalised") ASEM_noImpute
18.9 Visualisation
We can now visualise our results. A good way at the index level is a stacked bar chart.
iplotBar(ASEM, dset = "Aggregated", isel = "Index", aglev = 4, stack_children = T)
This can look a bit strange because in fact the ASEM index was only aggregated up to the sustainability and connectivity sub-indexes. We can also plot that:
iplotIndDist2(ASEM, dsets = "Aggregated", icodes = "Index")
We may also plot the results on a map. Here we’ll only plot connectivity:
iplotMap(ASEM, dset = "Aggregated", isel = "Conn")
For a bit more detail we can also generate a table.
getResults(ASEM, tab_type = "Summary") %>%
::kable() knitr
UnitCode | UnitName | Index | Rank |
---|---|---|---|
CHE | Switzerland | 67.70 | 1 |
DNK | Denmark | 64.15 | 2 |
NLD | Netherlands | 63.70 | 3 |
NOR | Norway | 63.66 | 4 |
BEL | Belgium | 62.88 | 5 |
SWE | Sweden | 62.43 | 6 |
LUX | Luxembourg | 61.51 | 7 |
AUT | Austria | 61.47 | 8 |
DEU | Germany | 60.36 | 9 |
MLT | Malta | 60.14 | 10 |
SVN | Slovenia | 60.04 | 11 |
IRL | Ireland | 60.02 | 12 |
SGP | Singapore | 59.00 | 13 |
FIN | Finland | 58.17 | 14 |
GBR | United Kingdom | 57.15 | 15 |
LTU | Lithuania | 56.87 | 16 |
CZE | Czech Republic | 56.47 | 17 |
HRV | Croatia | 55.51 | 18 |
HUN | Hungary | 55.22 | 19 |
EST | Estonia | 55.11 | 20 |
FRA | France | 54.82 | 21 |
LVA | Latvia | 54.72 | 22 |
SVK | Slovakia | 54.67 | 23 |
ROU | Romania | 54.46 | 24 |
ESP | Spain | 54.04 | 25 |
POL | Poland | 53.87 | 26 |
PRT | Portugal | 53.03 | 27 |
ITA | Italy | 51.38 | 28 |
BGR | Bulgaria | 50.68 | 29 |
CYP | Cyprus | 50.63 | 30 |
KOR | Korea | 50.20 | 31 |
NZL | New Zealand | 48.94 | 32 |
GRC | Greece | 48.64 | 33 |
JPN | Japan | 47.24 | 34 |
KHM | Cambodia | 45.63 | 35 |
BRN | Brunei Darussalam | 45.56 | 36 |
AUS | Australia | 45.42 | 37 |
VNM | Vietnam | 43.55 | 38 |
LAO | Lao PDR | 42.52 | 39 |
MYS | Malaysia | 42.25 | 40 |
MMR | Myanmar | 41.95 | 41 |
PHL | Philippines | 41.33 | 42 |
THA | Thailand | 41.27 | 43 |
MNG | Mongolia | 40.46 | 44 |
IDN | Indonesia | 39.93 | 45 |
IND | India | 39.48 | 46 |
KAZ | Kazakhstan | 38.95 | 47 |
BGD | Bangladesh | 38.92 | 48 |
CHN | China | 37.83 | 49 |
PAK | Pakistan | 37.69 | 50 |
RUS | Russian Federation | 35.45 | 51 |
18.10 Comparison
Since we built two slightly different indexes, it makes sense also to check the difference. How do the ranks change if we do or do not impute?
compTable(ASEM, ASEM_noImpute, dset = "Aggregated", isel = "Index")
## UnitCode UnitName RankCOIN1 RankCOIN2 RankChange AbsRankChange
## 29 LAO Lao PDR 39 45 -6 6
## 22 IND India 46 42 4 4
## 41 PHL Philippines 42 39 3 3
## 1 AUS Australia 37 35 2 2
## 6 BRN Brunei Darussalam 36 38 -2 2
## 16 FRA France 21 19 2 2
## 27 KHM Cambodia 35 37 -2 2
## 33 MLT Malta 10 12 -2 2
## 34 MMR Myanmar 41 43 -2 2
## 35 MNG Mongolia 44 46 -2 2
## 50 THA Thailand 43 41 2 2
## 51 VNM Vietnam 38 36 2 2
## 2 AUT Austria 8 7 1 1
## 5 BGR Bulgaria 29 30 -1 1
## 9 CYP Cyprus 30 29 1 1
## 12 DNK Denmark 2 3 -1 1
## 13 ESP Spain 25 24 1 1
## 14 EST Estonia 20 21 -1 1
## 18 GRC Greece 33 32 1 1
## 20 HUN Hungary 19 20 -1 1
## 21 IDN Indonesia 45 44 1 1
## 23 IRL Ireland 12 11 1 1
## 31 LUX Luxembourg 7 8 -1 1
## 32 LVA Latvia 22 23 -1 1
## 37 NLD Netherlands 3 2 1 1
## 39 NZL New Zealand 32 33 -1 1
## 44 ROU Romania 24 25 -1 1
## 47 SVK Slovakia 23 22 1 1
## 48 SVN Slovenia 11 10 1 1
## 3 BEL Belgium 5 5 0 0
## 4 BGD Bangladesh 48 48 0 0
## 7 CHE Switzerland 1 1 0 0
## 8 CHN China 49 49 0 0
## 10 CZE Czech Republic 17 17 0 0
## 11 DEU Germany 9 9 0 0
## 15 FIN Finland 14 14 0 0
## 17 GBR United Kingdom 15 15 0 0
## 19 HRV Croatia 18 18 0 0
## 24 ITA Italy 28 28 0 0
## 25 JPN Japan 34 34 0 0
## 26 KAZ Kazakhstan 47 47 0 0
## 28 KOR Korea 31 31 0 0
## 30 LTU Lithuania 16 16 0 0
## 36 MYS Malaysia 40 40 0 0
## 38 NOR Norway 4 4 0 0
## 40 PAK Pakistan 50 50 0 0
## 42 POL Poland 26 26 0 0
## 43 PRT Portugal 27 27 0 0
## 45 RUS Russian Federation 51 51 0 0
## 46 SGP Singapore 13 13 0 0
## 49 SWE Sweden 6 6 0 0
This shows that the maximum rank change is 6 places, at the index level.
18.11 Export
Most likely not everyone you present the results to will want to see it in R. COINr has a simple but quick way to export the entire contents of the COIN to Excel (effectively, it exports all data frames that are present). We will first generate a results table of the main results, attach it to the COIN, then export to Excel. Note that the results are anyway present in ASEM$Data$Aggregated
, but the getResults()
function provides tables that are better to present (the highest levels of aggregation are the first columns, rather than the last, and it is sorted by index score).
# Write full results table to COIN
getResults(ASEM, tab_type = "FullWithDenoms", out2 = "COIN")
# Export entire COIN to Excel
coin2Excel(ASEM, "ASEM_results.xlsx")
18.12 Summary
This is a fast example of COINr functions for building a composite indicator. Normally this would be an iterative process, but it showcases how simple commands can be used to do fairly complex operations in many cases. There is a lot more to all of the functions used here, and you should check the respective chapters of this book to better tune them to your needs.