Chapter 7 Denomination
The first section here gives a bit of introduction to the concept of denomination. If you just want to know how to denominate in COINr, skip to the section on Denominating in COINr.
7.1 Concept
Denomination refers to the operation of dividing one indicator by another. But why should we do that?
As anyone who has ever looked at a map will know, countries come in all different sizes. The problem is that many indicators, on their own, are strongly linked to the size of the country. That means that if we compare countries using these indicators directly, we will often get a ranking that has roughly the biggest countries at the top, and the smallest countries at the bottom.
To take an example, let’s examine some indicators in the ASEM data set, and introduce another plotting function in the process.
# load COINr package
library(COINr6)
library(magrittr) # for pipe operations
# build ASEM index
<- build_ASEM()
ASEM # plot international trade in goods against GDP
iplotIndDist2(ASEM, dsets = c("Denominators", "Raw"), icodes = c("Den_GDP", "Goods"))
The function iplotIndDist2()
is similar to iplotIndDist()
but allows plotting two indicators against each other. You can pick any indicator from any data set for each, including denominators.
What are these “denominators” anyway? Denominators are indicators that are used to scale other indicators, in order to remove the “size effect”. Typically, they are those related to physical or economic size, including GDP, population, land area and so on.
Anyway, looking at the plot above, it is clear that that countries with a higher GDP have a higher international trade in international goods (e.g. Germany, China, UK, Japan, France), which is not very surprising. The problem comes when we want to include this indicator as a measure of connectivity: on its own, trade in goods simply summarises having a large economy. What is more interesting, is to measure international trade per unit GDP, and this is done by dividing (i.e. denominating) the international trade of each country by its GDP. Let’s do that manually here and see what we get.
# divide trade by GDP
<- ASEM$Data$Raw$Goods/ASEM$Input$Denominators$Den_GDP
tradeperGDP # bar chart: add unit names first
iplotBar(data.frame(UnitCode = ASEM$Data$Raw$UnitCode, TradePerGDP = tradeperGDP))
Now the picture is completely different - small countries like Slovakia, Czech Republic and Singapore have the highest values.
The rankings here are completely different because the meanings of these two measures are completely different. Denomination is in fact a nonlinear transformation, because every value is divided by a different value (each country is divided by its own unique GDP, in this case). That doesn’t mean that denominated indicators are suddenly more “right” than the before their denomination, however. Trade per GDP is a useful measure of how much a country’s economy is dependent on international trade, but in terms of economic power, it might not be meaningful to scale by GDP. In summary, it is important to consider the meaning of the indicator compared to what you want to actually measure.
More precisely, indicators can be thought of as either intensive or extensive variables. Intensive variables are not (or only weakly) related to the size of the country, and allow “fair” comparisons between countries of different sizes. Extensive variables, on the contrary, are strongly related to the size of the country.
This distinction is well known in physics, for example. Mass is related to the size of the object and is an extensive variable. If we take a block of steel, and double its size (volume), we also double its mass. Density, which is mass per unit volume, is an intensive quantity: if we double the size of the block, the density remains the same.
- An example of an extensive variable is population. Bigger countries tend to have bigger populations.
- An example of an intensive variable is population density. This is no longer dependent on the physical size of the country.
The summary here is that an extensive variable becomes an intensive variable when we divide it by a denominator.
An important point to make here is about ordering. In the example above, we have simply divided two data frames by one another. To get the right result, we need to make sure that the units (rows) match properly in both data frames, as well as the columns. The example above is correct because the denominator data and indicator data originally came from the same data frame (ASEMIndData
). However normally it would be better to use R’s match()
or merge()
functions, or else similar tidyverse equivalents, to ensure no errors arise. In COINr, this ordering problem is automatically taken care of.
7.2 Denominating in COINr
Denomination is fairly simple to do in R, it’s just dividing one vector by another. Nevertheless, COINr has a dedicated function for denominating which makes life easier and helps you to track what you have done.
Before we get to that, it’s worth mentioning that COINr has a few built-in denominators sourced from the World Bank. It looks like this:
WorldDenoms## # A tibble: 249 x 7
## UnitName UnitCode Den_GDP Den_Pop Den_Area Den_GDPpc Den_IncomeGroup
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Afghanistan AFG 1.93e10 3.80e7 652860 507. Low income
## 2 Albania ALB 1.53e10 2.85e6 27400 5353. Upper middle in~
## 3 Algeria DZA 1.71e11 4.31e7 2381740 3974. Lower middle in~
## 4 American Sam~ ASM 6.36e 8 5.53e4 200 11467. Upper middle in~
## 5 Andorra AND 3.15e 9 7.71e4 470 40886. High income
## 6 Angola AGO 8.88e10 3.18e7 1246700 2791. Lower middle in~
## 7 Anguilla AIA NA NA NA NA <NA>
## 8 Antarctica ATA NA NA NA NA <NA>
## 9 Antigua and ~ ATG 1.66e 9 9.71e4 440 17113. High income
## 10 Argentina ARG 4.45e11 4.49e7 2736690 9912. Upper middle in~
## # ... with 239 more rows
and the metadata can be found by calling ?WorldDenoms
. Data here is the latest available as of February 2021 and I would recommend using these only for exploration, then updating your data yourself.
To denominate your indicators in COINr, the function to call is denominate()
. Like other COINr functions, this can be used either independently on a data frame of indicators, or on COINs. Consider that in all cases you need three main ingredients:
- Some indicator data that should be denominated
- Some other indicators to use as denominators
- A mapping to say which denominators (if any) to use for each indicator.
7.2.1 On COINs
If you are working with a COIN, the indicator data will be present in the .$Data
folder. If you specified any denominators in IndData
(i.e. columns beginning with “Den_”) when calling assemble()
you will also find them in .$Input$Denominators
. Finally, if you specified a Denominator
column in IndMeta
when calling assemble()
then the mapping of denominators to indicators will also be present.
# The raw indicator data which will be denominated
$Data$Raw
ASEM## # A tibble: 51 x 56
## UnitCode UnitName Year Group_GDP Group_GDPpc Group_Pop Group_EurAsia Goods
## <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 AUT Austria 2018 L XL M Europe 278.
## 2 BEL Belgium 2018 L L L Europe 598.
## 3 BGR Bulgaria 2018 S S M Europe 42.8
## 4 HRV Croatia 2018 S M S Europe 28.4
## 5 CYP Cyprus 2018 S L S Europe 8.77
## 6 CZE Czech Re~ 2018 M L M Europe 274.
## 7 DNK Denmark 2018 L XL M Europe 147.
## 8 EST Estonia 2018 S M S Europe 28.2
## 9 FIN Finland 2018 M XL M Europe 102.
## 10 FRA France 2018 XL L L Europe 849.
## # ... with 41 more rows, and 48 more variables: Services <dbl>, FDI <dbl>,
## # PRemit <dbl>, ForPort <dbl>, CostImpEx <dbl>, Tariff <dbl>, TBTs <dbl>,
## # TIRcon <dbl>, RTAs <dbl>, Visa <dbl>, StMob <dbl>, Research <dbl>,
## # Pat <dbl>, CultServ <dbl>, CultGood <dbl>, Tourist <dbl>, MigStock <dbl>,
## # Lang <dbl>, LPI <dbl>, Flights <dbl>, Ship <dbl>, Bord <dbl>, Elec <dbl>,
## # Gas <dbl>, ConSpeed <dbl>, Cov4G <dbl>, Embs <dbl>, IGOs <dbl>,
## # UNVote <dbl>, Renew <dbl>, PrimEner <dbl>, CO2 <dbl>, MatCon <dbl>, ...
# The denominators
$Input$Denominators
ASEM## # A tibble: 51 x 11
## UnitCode UnitName Year Group_GDP Group_GDPpc Group_Pop Group_EurAsia
## <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 AUT Austria 2018 L XL M Europe
## 2 BEL Belgium 2018 L L L Europe
## 3 BGR Bulgaria 2018 S S M Europe
## 4 HRV Croatia 2018 S M S Europe
## 5 CYP Cyprus 2018 S L S Europe
## 6 CZE Czech Republic 2018 M L M Europe
## 7 DNK Denmark 2018 L XL M Europe
## 8 EST Estonia 2018 S M S Europe
## 9 FIN Finland 2018 M XL M Europe
## 10 FRA France 2018 XL L L Europe
## # ... with 41 more rows, and 4 more variables: Den_Area <dbl>,
## # Den_Energy <dbl>, Den_GDP <dbl>, Den_Pop <dbl>
# The mapping of denominators to indicators (see Denominator column)
$Input$IndMeta
ASEM## # A tibble: 49 x 10
## IndName IndCode Direction IndWeight Denominator IndUnit Target Agg1 Agg2
## <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr>
## 1 Trade in~ Goods 1 1 Den_GDP Trilli~ 1.82e+3 ConE~ Conn
## 2 Trade in~ Servic~ 1 1 Den_GDP Millio~ 6.24e+2 ConE~ Conn
## 3 Foreign ~ FDI 1 1 Den_GDP Billio~ 7.18e+1 ConE~ Conn
## 4 Personal~ PRemit 1 1 Den_GDP Millio~ 2.87e+1 ConE~ Conn
## 5 Foreign ~ ForPort 1 1 Den_GDP Millio~ 1.01e+4 ConE~ Conn
## 6 Cost to ~ CostIm~ -1 1 <NA> Curren~ 4.96e+1 Inst~ Conn
## 7 Mean tar~ Tariff -1 1 <NA> Percent 5.26e-1 Inst~ Conn
## 8 Technica~ TBTs -1 1 <NA> Number~ 8.86e+1 Inst~ Conn
## 9 Signator~ TIRcon 1 1 <NA> (1 (ye~ 9.5 e-1 Inst~ Conn
## 10 Regional~ RTAs 1 1 <NA> Number~ 4.38e+1 Inst~ Conn
## # ... with 39 more rows, and 1 more variable: Agg3 <chr>
COINr’s denominate()
function knows where to look for each of these ingredients, so we can simply call:
<- denominate(ASEM, dset = "Raw") ASEM
which will return a new data set .Data$Denominated
. To return the dataset directly, rather than outputting an updated COIN, you can also set out2 = "df"
(this is a common argument to many functions which can be useful if you want to examine the result directly).
You can also change which indicators are denominated, and by what.
# Get denominator specification from metadata
<- ASEM$Input$IndMeta$Denominator
denomby_meta # Example: we want to change the denominator of flights from population to GDP
$Input$IndMeta$IndCode == "Flights"] <- "Den_GDP"
denomby_meta[ASEM# Now re-denominate. Return data frame for inspection
<- denominate(ASEM, dset = "Raw", specby = "user", denomby = denomby_meta) ASEM
Here we have changed the denominator of one of the indicators, “Flights”, to GDP. This is done by creating a character vector denomby_meta
(copied from the original denominator specification) which has an entry for each indicator, specifying which denominator to use, if any. We then changed the entry corresponding to “Flights”. This overwrites any previous denomination. If you want to keep and compare alternative specifications, see the chapter on Adjustments and comparisons.
Let’s compare the raw Flights data with the Flights per GDP data:
# plot raw flights against denominated
iplotIndDist2(ASEM, dsets = c("Raw", "Denominated"), icodes = "Flights")
Clearly, the denominated and raw indicators are very different from one another, reflecting the completely different meaning.
7.2.2 On data frames
If you are just working with data frames, you need to supply the three ingredients directly to the function. Here we will take some of the ASEM data for illustration (recalling that both indicator and denominator data is specified in ASEMIndMeta
).
# a small data frame of indicator data
<- ASEMIndData[c("UnitCode", "Goods", "Services", "FDI")]
IndData
IndData## # A tibble: 51 x 4
## UnitCode Goods Services FDI
## <chr> <dbl> <dbl> <dbl>
## 1 AUT 278. 108. 5
## 2 BEL 598. 216. 5.71
## 3 BGR 42.8 13.0 1.35
## 4 HRV 28.4 17.4 0.387
## 5 CYP 8.77 15.2 1.23
## 6 CZE 274. 43.5 3.88
## 7 DNK 147. 114. 9.1
## 8 EST 28.2 10.2 0.58
## 9 FIN 102. 53.8 6.03
## 10 FRA 849. 471. 30.9
## # ... with 41 more rows
# two selected denominators
<- ASEMIndData[c("UnitCode", "Den_Pop", "Den_GDP")]
Denoms
# denominate the data
<- denominate(IndData, denomby = c("Den_GDP", NA, "Den_Pop"), denominators = Denoms)
IndDataDenom
IndDataDenom## # A tibble: 51 x 4
## UnitCode Goods Services FDI
## <chr> <dbl> <dbl> <dbl>
## 1 AUT 0.712 108. 0.000572
## 2 BEL 1.28 216. 0.000500
## 3 BGR 0.804 13.0 0.000191
## 4 HRV 0.554 17.4 0.0000924
## 5 CYP 0.437 15.2 0.00104
## 6 CZE 1.40 43.5 0.000365
## 7 DNK 0.478 114. 0.00159
## 8 EST 1.21 10.2 0.000443
## 9 FIN 0.426 53.8 0.00109
## 10 FRA 0.344 471. 0.000476
## # ... with 41 more rows
Since the input is recognised as a data frame, you don’t need to specify any other arguments, and the output is automatically a data frame. Note how denomby
works: here it specifies that that “Goods” should be denominated by “Den_GDP”, that “Services” should not be denominated, and that “FDI” should be denominated by “Den_Pop”.
7.3 When to denominate, and by what?
Denomination is mathematically very simple, but from a conceptual point of view it needs to be handled with care. As we have shown, denominating an indicator will usually completely change it, and will have a corresponding impact on the results.
Two ways of looking at the problem are first, from the conceptual point of view. Consider each indicator and whether it fits with the aim of your index. Some indicators are anyway intensive, such as the percentage of tertiary graduates. Others, such as trade, will be strongly linked to the size of the country. In those cases, consider whether countries with high trade values should have higher scores in your index or not? Or should it be trade as a percentage of GDP? Or trade per capita? Each of these will have different meanings.
Sometimes extensive variables will anyway be the right choice. The Lowy Asia Power Index measures the power of each country in an absolute sense: in this case, the size of the country is all-important and e.g. trade or military capabilities per capita would not make much sense.
The second (complimentary) way to approach denomination is from a statistical point of view. We can check which indicators are strongly related to the size of a country by correlating them with some typical denominators, such as the ones used here. The getStats()
function does just this, checking the correlations between each indicator and each denominator, and flagging any possible high correlations.
# get statistics on raw data set
<- getStats(ASEM, dset = "Raw", out2 = "COIN", t_denom = 0.8)
ASEM ## Number of collinear indicators = 3
## Number of signficant negative indicator correlations = 322
## Number of indicators with high denominator correlations = 4
# get the result
<- ASEM$Analysis$Raw$DenomCorrelations
ctable # remove first column
<- ctable[-1]
ctable # return only columns that have entries with correlation above 0.8
colSums((abs(ctable) > 0.8))>0]
ctable[,## Goods FDI StMob CultGood
## 1 0.2473558 0.3668625 0.5228032 0.2696675
## 2 0.6276219 0.7405170 0.7345030 0.7120445
## 3 0.8026770 0.8482453 0.8379575 0.8510374
## 4 0.4158811 0.6529376 0.5523545 0.5028121
The matrix that is displayed above only includes columns where there is a correlation value (between an indicator and any denominator) of greater than 0.8. The results show some interesting patterns:
- Many indicators have a high positive correlation with GDP, including flights, trade, foreign direct investment (very high), personal remittances, research, stock of migrants, and so on.
- Bigger countries, in terms of land area, tend to have less trade agreements (RTAs) and a more restrictive visa policy
- Larger populations are associated with higher levels of poverty
The purpose here is not to draw causal links between these quantities, although they might reveal interesting patterns. Rather, these might suggest which quantities to denominate by, if the choices also work on a conceptual level.