# Chapter 18 Appendix: Building a Composite Indicator Example

Here we go through the main steps to building a composite indicator in COINr. This is effectively a condensed version of some earlier chapters in this book.

## 18.1 Before COINr (BC)

Before even getting to COINr you will have to take a number of steps. The steps can be summarised by e.g. the European Commission’s Ten Step Guide. In short, before getting to COINr you should have:

1. Tried to summarise all elements of the concept you are trying to capture, including by e.g.
• A literature review
• Talking to experts and stakeholders if possible
• Reviewing any other indicator frameworks on the topic
2. Constructed a “conceptual framework” using the information in 1. For a composite indicator, this should be a hierarchical structure.
3. Gathered indicator data (these need not be the final set of indicators)

Depending on how thorough you are, and how accessible your data is, this process can take a surprisingly long time. On these topics, there is a lot of good training material available through the European Commission’s Competence Centre for Composite Indicators and Scoreboards here.

## 18.2 Load data and assemble

Having got a set of preliminary indicator data, the next step is to load data into R in some way. Remember that to build a composite indicator with COINr you need to first build a COIN. To build a COIN you need three data frames, as explained in detail in COINs: the currency of COINr:

• Indicator data

Where these data frames come from is up to you. You might want to load data into R and then assemble them using your own script. Alternatively, you may wish to assemble these in Excel, for example, then read the sheets of the spreadsheet into R.

Let’s assume now that you have got these three data frames in the correct formats. Now you simply put them into assemble() to build your COIN. Here I will use the built-in data frames in COINr:

library(COINr)

ASEM <- assemble(IndData = ASEMIndData,
IndMeta = ASEMIndMeta,
AggMeta = ASEMAggMeta)
## -----------------
## Denominators detected - stored in .$Input$Denominators
## -----------------
## -----------------
## Indicator codes cross-checked and OK.
## -----------------
## Number of indicators = 49
## Number of units = 51
## Number of aggregation levels = 3 above indicator level.
## -----------------
## Aggregation level 1 with 8 aggregate groups: Physical, ConEcFin, Political, Instit, P2P, Environ, Social, SusEcFin
## Cross-check between metadata and framework = OK.
## Aggregation level 2 with 2 aggregate groups: Conn, Sust
## Cross-check between metadata and framework = OK.
## Aggregation level 3 with 1 aggregate groups: Index
## Cross-check between metadata and framework = OK.
## -----------------

## 18.3 Check the data

At this point, it’s worth checking the COIN to make sure that everything is as you would expect. One thing is to check the framework:

plotframework(ASEM)

You can also explore the indicator data using COINr’s indDash() app:

# not run here because requires an active R session
indDash(ASEM)

I would also recommend to check the statistics of each indicator using getStats(). Look for any indicators that are highly skewed, have few unique values, have low data availability, or strong correlations with other indicators or with denominators.

# load reactable for viewing table (in R you can do this instead with R Studio)
library(reactable)
library(magrittr)

# get stats
ASEM <- getStats(ASEM, dset = "Raw")
## Number of collinear indicators =  3
## Number of signficant negative indicator correlations =  322
## Number of indicators with high denominator correlations =  7

# view stats table
ASEM$Analysis$Raw$StatTable %>% roundDF() %>% reactable() At this point you may decide to check individual indicators, and some may be added or excluded. ## 18.4 Denomination If you want to divide indicators by e.g. GDP or population to be able to compare small countries with larger ones, you can use denominate(). The specifications are either made initially in IndMeta, or as arguments to denominate(). In the case of the ASEM data set, these are included in IndMeta so the command is very simple (run View(ASEMIndMeta) to see). We will afterwards check the new stats to see what has changed. # create denominated data set ASEM <- denominate(ASEM, dset = "Raw") # get stats of denominated data ASEM <- getStats(ASEM, dset = "Denominated") ## Number of collinear indicators = 0 ## Number of signficant negative indicator correlations = 440 ## Number of indicators with high denominator correlations = 0 # view stats table ASEM$Analysis$Raw$StatTable %>%
roundDF() %>%
reactable()

According to the new table, there are now no high correlations with denominators, which indicates some kind of success.

## 18.5 Imputation

If you have missing data (and if you don’t, you are fortunate), you can choose to impute the data. If you don’t impute missing data, you can still build a composite indicator though.

We can use one of COINr’s options for imputing data. Like other commands, we operate on the COIN and the output is an updated COIN. Actually at this point I will make two alternatives - one which imputes missing data using a group median, and another which performs no imputation.

# Make a copy
ASEM_noImpute <- ASEM

# impute one
ASEM <- impute(ASEM, dset = "Denominated", imtype = "indgroup_mean", groupvar = "Group_GDP")
## Missing data points detected = 65
## Missing data points imputed = 65, using method = indgroup_mean

## 18.6 Data treatment

Next we may wish to treat the data for outliers. Here we apply a standard approach which Winsorises each indicator up to a specified limit of points, in order to bring skew and kurtosis below specified thresholds. If Winsorisation fails, it applies a log transformation or similar.

ASEM <- treat(ASEM, dset = "Imputed", winmax = 5)
ASEM_noImpute <- treat(ASEM_noImpute, dset = "Denominated", winmax = 5)

Following treatment, it is a good idea to check which indicators were treated and how:

library(dplyr)

ASEM$Analysis$Treated$TreatSummary %>% filter(Treatment != "None") ## IndCode Low High TreatSpec Treatment ## V2 Services 0 4 Default, winmax = 5 Winsorised 4 points ## V3 FDI 0 2 Default, winmax = 5 Winsorised 2 points ## V5 ForPort 0 5 Default, winmax = 5 Winsorised 3 points ## V6 CostImpEx 0 1 Default, winmax = 5 Winsorised 1 points ## V7 Tariff 0 4 Default, winmax = 5 Winsorised 4 points ## V12 StMob 0 2 Default, winmax = 5 Winsorised 2 points ## V15 CultServ 0 4 Default, winmax = 5 Winsorised 3 points ## V21 Flights 0 1 Default, winmax = 5 Winsorised 1 points ## V23 Bord 0 3 Default, winmax = 5 Winsorised 3 points ## V25 Gas 0 3 Default, winmax = 5 Winsorised 3 points ## V35 Forest 0 2 Default, winmax = 5 Winsorised 2 points ## V36 Poverty 0 3 Default, winmax = 5 Winsorised 3 points ## V41 NGOs 0 3 Default, winmax = 5 Winsorised 3 points ## V43 FemLab 1 0 Default, winmax = 5 Winsorised 1 points ## V45 PubDebt 0 1 Default, winmax = 5 Winsorised 1 points It is also a good idea to visualise and compare the treated data against the untreated data. The best way to do this interactively is to call indDash() again, which allows comparison of treated and untreated indicators side by side. We can also do this manually (or for presentation) for specific indicators: iplotIndDist2(ASEM, dsets = c("Imputed", "Treated"), icodes = "Services", ptype = "Scatter") This shows the Winsorisation of four points for the “Services” indicator. This could also be plotted in different ways using box plots or violin plots. ## 18.7 Normalisation The next step would be to normalise the data. In the ASEM index we will use a simple min-max normalisation in the $$[0, 100]$$ interval. ASEM <- normalise(ASEM, dset = "Treated", ntype = "minmax", npara = list(minmax = c(0,100))) ASEM_noImpute <- normalise(ASEM_noImpute, dset = "Treated", ntype = "minmax", npara = list(minmax = c(0,100))) Again, we could visualise and check stats here but to keep things shorter we’ll skip that for now. ## 18.8 Aggregation The last construction step (apart from iterative changes) is to aggregate. Again we use a simple arithmetic mean. The structure of the index is stored in the IndMeta data frame inside the COIN, so we only need to specify which data set to aggregate, and which method to use. ASEM <- aggregate(ASEM, agtype = "arith_mean", dset = "Normalised") ASEM_noImpute <- aggregate(ASEM_noImpute, agtype = "arith_mean", dset = "Normalised") ## 18.9 Visualisation We can now visualise our results. A good way at the index level is a stacked bar chart. iplotBar(ASEM, dset = "Aggregated", isel = "Index", aglev = 4, stack_children = T) This can look a bit strange because in fact the ASEM index was only aggregated up to the sustainability and connectivity sub-indexes. We can also plot that: iplotIndDist2(ASEM, dsets = "Aggregated", icodes = "Index") We may also plot the results on a map. Here we’ll only plot connectivity: iplotMap(ASEM, dset = "Aggregated", isel = "Conn") For a bit more detail we can also generate a table. getResults(ASEM, tab_type = "Summary") %>% knitr::kable() UnitCode UnitName Index Rank CHE Switzerland 67.70 1 DNK Denmark 64.15 2 NLD Netherlands 63.70 3 NOR Norway 63.66 4 BEL Belgium 62.88 5 SWE Sweden 62.43 6 LUX Luxembourg 61.51 7 AUT Austria 61.47 8 DEU Germany 60.36 9 MLT Malta 60.14 10 SVN Slovenia 60.04 11 IRL Ireland 60.02 12 SGP Singapore 59.00 13 FIN Finland 58.17 14 GBR United Kingdom 57.15 15 LTU Lithuania 56.87 16 CZE Czech Republic 56.47 17 HRV Croatia 55.51 18 HUN Hungary 55.22 19 EST Estonia 55.11 20 FRA France 54.82 21 LVA Latvia 54.72 22 SVK Slovakia 54.67 23 ROU Romania 54.46 24 ESP Spain 54.04 25 POL Poland 53.87 26 PRT Portugal 53.03 27 ITA Italy 51.38 28 BGR Bulgaria 50.68 29 CYP Cyprus 50.63 30 KOR Korea 50.20 31 NZL New Zealand 48.94 32 GRC Greece 48.64 33 JPN Japan 47.24 34 KHM Cambodia 45.63 35 BRN Brunei Darussalam 45.56 36 AUS Australia 45.42 37 VNM Vietnam 43.55 38 LAO Lao PDR 42.52 39 MYS Malaysia 42.25 40 MMR Myanmar 41.95 41 PHL Philippines 41.33 42 THA Thailand 41.27 43 MNG Mongolia 40.46 44 IDN Indonesia 39.93 45 IND India 39.48 46 KAZ Kazakhstan 38.95 47 BGD Bangladesh 38.92 48 CHN China 37.83 49 PAK Pakistan 37.69 50 RUS Russian Federation 35.45 51 ## 18.10 Comparison Since we built two slightly different indexes, it makes sense also to check the difference. How do the ranks change if we do or do not impute? compTable(ASEM, ASEM_noImpute, dset = "Aggregated", isel = "Index") ## UnitCode UnitName RankCOIN1 RankCOIN2 RankChange AbsRankChange ## 29 LAO Lao PDR 39 45 -6 6 ## 22 IND India 46 42 4 4 ## 41 PHL Philippines 42 39 3 3 ## 1 AUS Australia 37 35 2 2 ## 6 BRN Brunei Darussalam 36 38 -2 2 ## 16 FRA France 21 19 2 2 ## 27 KHM Cambodia 35 37 -2 2 ## 33 MLT Malta 10 12 -2 2 ## 34 MMR Myanmar 41 43 -2 2 ## 35 MNG Mongolia 44 46 -2 2 ## 50 THA Thailand 43 41 2 2 ## 51 VNM Vietnam 38 36 2 2 ## 2 AUT Austria 8 7 1 1 ## 5 BGR Bulgaria 29 30 -1 1 ## 9 CYP Cyprus 30 29 1 1 ## 12 DNK Denmark 2 3 -1 1 ## 13 ESP Spain 25 24 1 1 ## 14 EST Estonia 20 21 -1 1 ## 18 GRC Greece 33 32 1 1 ## 20 HUN Hungary 19 20 -1 1 ## 21 IDN Indonesia 45 44 1 1 ## 23 IRL Ireland 12 11 1 1 ## 31 LUX Luxembourg 7 8 -1 1 ## 32 LVA Latvia 22 23 -1 1 ## 37 NLD Netherlands 3 2 1 1 ## 39 NZL New Zealand 32 33 -1 1 ## 44 ROU Romania 24 25 -1 1 ## 47 SVK Slovakia 23 22 1 1 ## 48 SVN Slovenia 11 10 1 1 ## 3 BEL Belgium 5 5 0 0 ## 4 BGD Bangladesh 48 48 0 0 ## 7 CHE Switzerland 1 1 0 0 ## 8 CHN China 49 49 0 0 ## 10 CZE Czech Republic 17 17 0 0 ## 11 DEU Germany 9 9 0 0 ## 15 FIN Finland 14 14 0 0 ## 17 GBR United Kingdom 15 15 0 0 ## 19 HRV Croatia 18 18 0 0 ## 24 ITA Italy 28 28 0 0 ## 25 JPN Japan 34 34 0 0 ## 26 KAZ Kazakhstan 47 47 0 0 ## 28 KOR Korea 31 31 0 0 ## 30 LTU Lithuania 16 16 0 0 ## 36 MYS Malaysia 40 40 0 0 ## 38 NOR Norway 4 4 0 0 ## 40 PAK Pakistan 50 50 0 0 ## 42 POL Poland 26 26 0 0 ## 43 PRT Portugal 27 27 0 0 ## 45 RUS Russian Federation 51 51 0 0 ## 46 SGP Singapore 13 13 0 0 ## 49 SWE Sweden 6 6 0 0 This shows that the maximum rank change is 6 places, at the index level. ## 18.11 Export Most likely not everyone you present the results to will want to see it in R. COINr has a simple but quick way to export the entire contents of the COIN to Excel (effectively, it exports all data frames that are present). We will first generate a results table of the main results, attach it to the COIN, then export to Excel. Note that the results are anyway present in ASEM$Data\$Aggregated, but the getResults() function provides tables that are better to present (the highest levels of aggregation are the first columns, rather than the last, and it is sorted by index score).

# Write full results table to COIN
getResults(ASEM, tab_type = "FullWithDenoms", out2 = "COIN")

# Export entire COIN to Excel
coin2Excel(ASEM, "ASEM_results.xlsx")

## 18.12 Summary

This is a fast example of COINr functions for building a composite indicator. Normally this would be an iterative process, but it showcases how simple commands can be used to do fairly complex operations in many cases. There is a lot more to all of the functions used here, and you should check the respective chapters of this book to better tune them to your needs.