Chapter 7 Denomination

The first section here gives a bit of introduction to the concept of denomination. If you just want to know how to denominate in COINr, skip to the section on Denominating in COINr.

7.1 Concept

Denomination refers to the operation of dividing one indicator by another. But why should we do that?

As anyone who has ever looked at a map will know, countries come in all different sizes. The problem is that many indicators, on their own, are strongly linked to the size of the country. That means that if we compare countries using these indicators directly, we will often get a ranking that has roughly the biggest countries at the top, and the smallest countries at the bottom.

To take an example, let’s examine some indicators in the ASEM data set, and introduce another plotting function in the process.

# load COINr package
library(COINr6)
library(magrittr) # for pipe operations
# build ASEM index
ASEM <- build_ASEM()
# plot international trade in goods against GDP
iplotIndDist2(ASEM, dsets = c("Denominators", "Raw"), icodes = c("Den_GDP", "Goods"))

# (note: need to fix labelling and units of denominators)

The function iplotIndDist2() is similar to iplotIndDist() but allows plotting two indicators against each other. You can pick any indicator from any data set for each, including denominators.

What are these “denominators” anyway? Denominators are indicators that are used to scale other indicators, in order to remove the “size effect”. Typically, they are those related to physical or economic size, including GDP, population, land area and so on.

Anyway, looking at the plot above, it is clear that that countries with a higher GDP have a higher international trade in international goods (e.g. Germany, China, UK, Japan, France), which is not very surprising. The problem comes when we want to include this indicator as a measure of connectivity: on its own, trade in goods simply summarises having a large economy. What is more interesting, is to measure international trade per unit GDP, and this is done by dividing (i.e. denominating) the international trade of each country by its GDP. Let’s do that manually here and see what we get.

# divide trade by GDP
tradeperGDP <- ASEM$Data$Raw$Goods/ASEM$Input$Denominators$Den_GDP
# bar chart: add unit names first
iplotBar(data.frame(UnitCode = ASEM$Data$Raw$UnitCode, TradePerGDP = tradeperGDP))

Now the picture is completely different - small countries like Slovakia, Czech Republic and Singapore have the highest values.

The rankings here are completely different because the meanings of these two measures are completely different. Denomination is in fact a nonlinear transformation, because every value is divided by a different value (each country is divided by its own unique GDP, in this case). That doesn’t mean that denominated indicators are suddenly more “right” than the before their denomination, however. Trade per GDP is a useful measure of how much a country’s economy is dependent on international trade, but in terms of economic power, it might not be meaningful to scale by GDP. In summary, it is important to consider the meaning of the indicator compared to what you want to actually measure.

More precisely, indicators can be thought of as either intensive or extensive variables. Intensive variables are not (or only weakly) related to the size of the country, and allow “fair” comparisons between countries of different sizes. Extensive variables, on the contrary, are strongly related to the size of the country.

This distinction is well known in physics, for example. Mass is related to the size of the object and is an extensive variable. If we take a block of steel, and double its size (volume), we also double its mass. Density, which is mass per unit volume, is an intensive quantity: if we double the size of the block, the density remains the same.

An example of an extensive variable is population. Bigger countries tend to have bigger populations.
An example of an intensive variable is population density. This is no longer dependent on the physical size of the country.

The summary here is that an extensive variable becomes an intensive variable when we divide it by a denominator.

An important point to make here is about ordering. In the example above, we have simply divided two data frames by one another. To get the right result, we need to make sure that the units (rows) match properly in both data frames, as well as the columns. The example above is correct because the denominator data and indicator data originally came from the same data frame (ASEMIndData). However normally it would be better to use R’s match() or merge() functions, or else similar tidyverse equivalents, to ensure no errors arise. In COINr, this ordering problem is automatically taken care of.

7.2 Denominating in COINr

Denomination is fairly simple to do in R, it’s just dividing one vector by another. Nevertheless, COINr has a dedicated function for denominating which makes life easier and helps you to track what you have done.

Before we get to that, it’s worth mentioning that COINr has a few built-in denominators sourced from the World Bank. It looks like this:

WorldDenoms
## # A tibble: 249 x 7
##    UnitName      UnitCode    Den_GDP Den_Pop Den_Area Den_GDPpc Den_IncomeGroup 
##    <chr>         <chr>         <dbl>   <dbl>    <dbl>     <dbl> <chr>           
##  1 Afghanistan   AFG         1.93e10  3.80e7   652860      507. Low income      
##  2 Albania       ALB         1.53e10  2.85e6    27400     5353. Upper middle in~
##  3 Algeria       DZA         1.71e11  4.31e7  2381740     3974. Lower middle in~
##  4 American Sam~ ASM         6.36e 8  5.53e4      200    11467. Upper middle in~
##  5 Andorra       AND         3.15e 9  7.71e4      470    40886. High income     
##  6 Angola        AGO         8.88e10  3.18e7  1246700     2791. Lower middle in~
##  7 Anguilla      AIA        NA       NA            NA       NA  <NA>            
##  8 Antarctica    ATA        NA       NA            NA       NA  <NA>            
##  9 Antigua and ~ ATG         1.66e 9  9.71e4      440    17113. High income     
## 10 Argentina     ARG         4.45e11  4.49e7  2736690     9912. Upper middle in~
## # ... with 239 more rows

and the metadata can be found by calling ?WorldDenoms. Data here is the latest available as of February 2021 and I would recommend using these only for exploration, then updating your data yourself.

To denominate your indicators in COINr, the function to call is denominate(). Like other COINr functions, this can be used either independently on a data frame of indicators, or on COINs. Consider that in all cases you need three main ingredients:

Some indicator data that should be denominated
Some other indicators to use as denominators
A mapping to say which denominators (if any) to use for each indicator.

7.2.1 On COINs

If you are working with a COIN, the indicator data will be present in the .$Data folder. If you specified any denominators in IndData (i.e. columns beginning with “Den_”) when calling assemble() you will also find them in .$Input$Denominators. Finally, if you specified a Denominator column in IndMeta when calling assemble() then the mapping of denominators to indicators will also be present.

# The raw indicator data which will be denominated
ASEM$Data$Raw
## # A tibble: 51 x 56
##    UnitCode UnitName   Year Group_GDP Group_GDPpc Group_Pop Group_EurAsia  Goods
##    <chr>    <chr>     <dbl> <chr>     <chr>       <chr>     <chr>          <dbl>
##  1 AUT      Austria    2018 L         XL          M         Europe        278.  
##  2 BEL      Belgium    2018 L         L           L         Europe        598.  
##  3 BGR      Bulgaria   2018 S         S           M         Europe         42.8 
##  4 HRV      Croatia    2018 S         M           S         Europe         28.4 
##  5 CYP      Cyprus     2018 S         L           S         Europe          8.77
##  6 CZE      Czech Re~  2018 M         L           M         Europe        274.  
##  7 DNK      Denmark    2018 L         XL          M         Europe        147.  
##  8 EST      Estonia    2018 S         M           S         Europe         28.2 
##  9 FIN      Finland    2018 M         XL          M         Europe        102.  
## 10 FRA      France     2018 XL        L           L         Europe        849.  
## # ... with 41 more rows, and 48 more variables: Services <dbl>, FDI <dbl>,
## #   PRemit <dbl>, ForPort <dbl>, CostImpEx <dbl>, Tariff <dbl>, TBTs <dbl>,
## #   TIRcon <dbl>, RTAs <dbl>, Visa <dbl>, StMob <dbl>, Research <dbl>,
## #   Pat <dbl>, CultServ <dbl>, CultGood <dbl>, Tourist <dbl>, MigStock <dbl>,
## #   Lang <dbl>, LPI <dbl>, Flights <dbl>, Ship <dbl>, Bord <dbl>, Elec <dbl>,
## #   Gas <dbl>, ConSpeed <dbl>, Cov4G <dbl>, Embs <dbl>, IGOs <dbl>,
## #   UNVote <dbl>, Renew <dbl>, PrimEner <dbl>, CO2 <dbl>, MatCon <dbl>, ...
# The denominators
ASEM$Input$Denominators
## # A tibble: 51 x 11
##    UnitCode UnitName        Year Group_GDP Group_GDPpc Group_Pop Group_EurAsia
##    <chr>    <chr>          <dbl> <chr>     <chr>       <chr>     <chr>        
##  1 AUT      Austria         2018 L         XL          M         Europe       
##  2 BEL      Belgium         2018 L         L           L         Europe       
##  3 BGR      Bulgaria        2018 S         S           M         Europe       
##  4 HRV      Croatia         2018 S         M           S         Europe       
##  5 CYP      Cyprus          2018 S         L           S         Europe       
##  6 CZE      Czech Republic  2018 M         L           M         Europe       
##  7 DNK      Denmark         2018 L         XL          M         Europe       
##  8 EST      Estonia         2018 S         M           S         Europe       
##  9 FIN      Finland         2018 M         XL          M         Europe       
## 10 FRA      France          2018 XL        L           L         Europe       
## # ... with 41 more rows, and 4 more variables: Den_Area <dbl>,
## #   Den_Energy <dbl>, Den_GDP <dbl>, Den_Pop <dbl>
# The mapping of denominators to indicators (see Denominator column)
ASEM$Input$IndMeta
## # A tibble: 49 x 10
##    IndName   IndCode Direction IndWeight Denominator IndUnit  Target Agg1  Agg2 
##    <chr>     <chr>       <dbl>     <dbl> <chr>       <chr>     <dbl> <chr> <chr>
##  1 Trade in~ Goods           1         1 Den_GDP     Trilli~ 1.82e+3 ConE~ Conn 
##  2 Trade in~ Servic~         1         1 Den_GDP     Millio~ 6.24e+2 ConE~ Conn 
##  3 Foreign ~ FDI             1         1 Den_GDP     Billio~ 7.18e+1 ConE~ Conn 
##  4 Personal~ PRemit          1         1 Den_GDP     Millio~ 2.87e+1 ConE~ Conn 
##  5 Foreign ~ ForPort         1         1 Den_GDP     Millio~ 1.01e+4 ConE~ Conn 
##  6 Cost to ~ CostIm~        -1         1 <NA>        Curren~ 4.96e+1 Inst~ Conn 
##  7 Mean tar~ Tariff         -1         1 <NA>        Percent 5.26e-1 Inst~ Conn 
##  8 Technica~ TBTs           -1         1 <NA>        Number~ 8.86e+1 Inst~ Conn 
##  9 Signator~ TIRcon          1         1 <NA>        (1 (ye~ 9.5 e-1 Inst~ Conn 
## 10 Regional~ RTAs            1         1 <NA>        Number~ 4.38e+1 Inst~ Conn 
## # ... with 39 more rows, and 1 more variable: Agg3 <chr>

COINr’s denominate() function knows where to look for each of these ingredients, so we can simply call:

ASEM <- denominate(ASEM, dset = "Raw")

which will return a new data set .Data$Denominated. To return the dataset directly, rather than outputting an updated COIN, you can also set out2 = "df" (this is a common argument to many functions which can be useful if you want to examine the result directly).

You can also change which indicators are denominated, and by what.

# Get denominator specification from metadata
denomby_meta <- ASEM$Input$IndMeta$Denominator
# Example: we want to change the denominator of flights from population to GDP
denomby_meta[ASEM$Input$IndMeta$IndCode == "Flights"] <- "Den_GDP"
# Now re-denominate. Return data frame for inspection
ASEM <- denominate(ASEM, dset = "Raw", specby = "user", denomby = denomby_meta)

Here we have changed the denominator of one of the indicators, “Flights”, to GDP. This is done by creating a character vector denomby_meta (copied from the original denominator specification) which has an entry for each indicator, specifying which denominator to use, if any. We then changed the entry corresponding to “Flights”. This overwrites any previous denomination. If you want to keep and compare alternative specifications, see the chapter on Adjustments and comparisons.

Let’s compare the raw Flights data with the Flights per GDP data:

# plot raw flights against denominated
iplotIndDist2(ASEM, dsets = c("Raw", "Denominated"), icodes = "Flights")

Clearly, the denominated and raw indicators are very different from one another, reflecting the completely different meaning.

7.2.2 On data frames

If you are just working with data frames, you need to supply the three ingredients directly to the function. Here we will take some of the ASEM data for illustration (recalling that both indicator and denominator data is specified in ASEMIndMeta).

# a small data frame of indicator data
IndData <- ASEMIndData[c("UnitCode", "Goods", "Services", "FDI")]
IndData
## # A tibble: 51 x 4
##    UnitCode  Goods Services    FDI
##    <chr>     <dbl>    <dbl>  <dbl>
##  1 AUT      278.      108.   5    
##  2 BEL      598.      216.   5.71 
##  3 BGR       42.8      13.0  1.35 
##  4 HRV       28.4      17.4  0.387
##  5 CYP        8.77     15.2  1.23 
##  6 CZE      274.       43.5  3.88 
##  7 DNK      147.      114.   9.1  
##  8 EST       28.2      10.2  0.58 
##  9 FIN      102.       53.8  6.03 
## 10 FRA      849.      471.  30.9  
## # ... with 41 more rows
# two selected denominators
Denoms <- ASEMIndData[c("UnitCode", "Den_Pop", "Den_GDP")]

# denominate the data
IndDataDenom <- denominate(IndData, denomby = c("Den_GDP", NA, "Den_Pop"), denominators = Denoms)
IndDataDenom
## # A tibble: 51 x 4
##    UnitCode Goods Services       FDI
##    <chr>    <dbl>    <dbl>     <dbl>
##  1 AUT      0.712    108.  0.000572 
##  2 BEL      1.28     216.  0.000500 
##  3 BGR      0.804     13.0 0.000191 
##  4 HRV      0.554     17.4 0.0000924
##  5 CYP      0.437     15.2 0.00104  
##  6 CZE      1.40      43.5 0.000365 
##  7 DNK      0.478    114.  0.00159  
##  8 EST      1.21      10.2 0.000443 
##  9 FIN      0.426     53.8 0.00109  
## 10 FRA      0.344    471.  0.000476 
## # ... with 41 more rows

Since the input is recognised as a data frame, you don’t need to specify any other arguments, and the output is automatically a data frame. Note how denomby works: here it specifies that that “Goods” should be denominated by “Den_GDP”, that “Services” should not be denominated, and that “FDI” should be denominated by “Den_Pop”.

7.3 When to denominate, and by what?

Denomination is mathematically very simple, but from a conceptual point of view it needs to be handled with care. As we have shown, denominating an indicator will usually completely change it, and will have a corresponding impact on the results.

Two ways of looking at the problem are first, from the conceptual point of view. Consider each indicator and whether it fits with the aim of your index. Some indicators are anyway intensive, such as the percentage of tertiary graduates. Others, such as trade, will be strongly linked to the size of the country. In those cases, consider whether countries with high trade values should have higher scores in your index or not? Or should it be trade as a percentage of GDP? Or trade per capita? Each of these will have different meanings.

Sometimes extensive variables will anyway be the right choice. The Lowy Asia Power Index measures the power of each country in an absolute sense: in this case, the size of the country is all-important and e.g. trade or military capabilities per capita would not make much sense.

The second (complimentary) way to approach denomination is from a statistical point of view. We can check which indicators are strongly related to the size of a country by correlating them with some typical denominators, such as the ones used here. The getStats() function does just this, checking the correlations between each indicator and each denominator, and flagging any possible high correlations.

# get statistics on raw data set
ASEM <- getStats(ASEM, dset = "Raw", out2 = "COIN", t_denom = 0.8)
## Number of collinear indicators =  3
## Number of signficant negative indicator correlations =  322
## Number of indicators with high denominator correlations =  4
# get the result
ctable <- ASEM$Analysis$Raw$DenomCorrelations
# remove first column
ctable <- ctable[-1]
# return only columns that have entries with correlation above 0.8
ctable[,colSums((abs(ctable) > 0.8))>0]
##       Goods       FDI     StMob  CultGood
## 1 0.2473558 0.3668625 0.5228032 0.2696675
## 2 0.6276219 0.7405170 0.7345030 0.7120445
## 3 0.8026770 0.8482453 0.8379575 0.8510374
## 4 0.4158811 0.6529376 0.5523545 0.5028121

The matrix that is displayed above only includes columns where there is a correlation value (between an indicator and any denominator) of greater than 0.8. The results show some interesting patterns:

Many indicators have a high positive correlation with GDP, including flights, trade, foreign direct investment (very high), personal remittances, research, stock of migrants, and so on.
Bigger countries, in terms of land area, tend to have less trade agreements (RTAs) and a more restrictive visa policy
Larger populations are associated with higher levels of poverty

The purpose here is not to draw causal links between these quantities, although they might reveal interesting patterns. Rather, these might suggest which quantities to denominate by, if the choices also work on a conceptual level.