Chapter 4 Initial visualisation and analysis
One of the first things to do with indicator data is to look at it, in as many ways as possible. This helps to get a feel for how the data is distributed between units/countries, how it may be spatially distributed, and how indicators relate to one another. COINr includes various tools for visualising and analysing indicator data and the index structure.
The plotting tools here can be useful at any stage of building a composite indicator or scoreboard, from initial visualisation of the data, to checking the effects of data treatment, to visualising final index results.
Independently from the indicator data, a good way to begin is to check the structure of the index. This can be done visually with the
plotframework() function, which generates a sunburst plot of the index structure.
library(COINr) # assemble ASEM COIN <- assemble(ASEMIndData, ASEMIndMeta, ASEMAggMeta) |> ASEM suppressMessages() # plot framework plotframework(ASEM)
The sunburst plot is useful for a few things. First, it shows the structure that COINr has understood. If you get an error here, it is probably an indication that something has gone wrong in the input of the structure, so go back to the input data and check. If it does successfully display the sunburst plot, you can check whether the structure agrees with your expectations.
Second, it shows the effective weight of each indicator (the value is visible by hovering over each segment). Effective weights are the final weights of each indicator in the index, as a result of the indicator weights, the parent aggregate weights, and the structure of the index. This can reveal which indicators are implicitly weighted more than others, by e.g. having more or less indicators in the same aggregation groups.
Finally, it can be a good way to communicate your index structure to other people.
Individual indicator distributions can be visualised in several different ways. For static plots, the main tool is
plotIndDist() which generates histograms, boxplots, violin plots, dot plots and violin-dot plots. This is powered by
ggplot2, and if you want to customise plots, you should use that directly. However, COINr plotting functions are intended as quick tools to visualise data, with easy access to the hierarchical data set. You can plot individual indicators:
plotIndDist(ASEM, type = "Histogram", icodes = "LPI")
And you can also plot groups of indicators by calling aggregate names (notice that when multiple indicators are plotted, the indicator codes are used to label each plot, rather than indicator names, to save space):
plotIndDist(ASEM, type = "Violindot", icodes = "Physical")
plotIndDist() function has several arguments. In the first place, any indicator or aggregation (pillar, dimension, index etc) can be plotted by using the
dset argument. If you have only just assembled the COIN, you will only have the “Raw” dataset, but any other dataset can be accessed, e.g. treated data set, aggregated data set, and so on. You can also target different levels using the
aglev argument - for more details see the chapter on Helper functions.
Stand-alone data frames are also supported by
plotIndDist() (this can also be achieved directly by
ggplot without too much effort):
<- as.data.frame(matrix(rnorm(90),nrow = 30, ncol = 3)) df colnames(df) <- c("Dogs", "Cats", "Rabbits") plotIndDist(df, type = "Box")
COINr also includes some interactive plots, which are embedded into apps (see later), but can be used for your own purposes, such as embedding in HTML documents or websites.
iplotIndDist(ASEM, "Raw", "Renew", ptype = "Violin")
Since all the plotting functions output plot objects (plotly objects for
iplotIndDist, and ggplot2 plot objects for
plotIndDist), you can also modify them if you want to customise the plots. This might be a helpful workflow - to use COINr’s default options and then tweak the plot to your liking. In a very simple example, here we just change the title.
iplotIndDist(ASEM, "Raw", "Flights", ptype = "Histogram") |> ::layout(title = "Customised plot") plotly
If you are purely interested in exploring the data, rather than presenting it to someone else, the plots here are also embedded into a Shiny app which lets you quickly explore and compare indicator distributions - see Data Treatment for more details on this.
4.3 Ranks and Maps
While the previous functions concerned plotting the statistical distributions of each indicator, functions are also available for plotting the indicator values in order or on a map.
iplotBar(ASEM, dset = "Raw", isel = "Embs", usel = "SGP", aglev = 1)
Here, a single indicator is plotted in order as a bar chart. There is an optional argument to highlight one or more units, using the
usel argument. This function also allows to select groups of units from the "Group_*" columns of
# plot only units from Asia group iplotBar(ASEM, dset = "Raw", isel = "Goods", aglev = 1, from_group = list(Group_EurAsia = "Asia"))
From a different perspective, we can plot the same data on a map:
iplotMap(ASEM, dset = "Raw", isel = "Embs")
Note that this only works if
IndData$UnitCode corresponds to ISO alpha-3 country codes. If you want to do some more sophisticated mapping R, Plotly has many mapping options, but R in general a wealth of mapping packages, you just have to search for them. COINr uses Plotly maps to keep things simple and to not depend on too many packages.
Another useful plotting function is
plotIndDot(), which simply plots a single indicator using dots, with options to highlight particular units.
plotIndDot(ASEM, dset = "Raw", icode = "CO2", usel = c("GBR", "ESP", "AUS"), add_stat = "median")
This also gives the option to add a statistic, either mean or median, or to pass your own value to mark. This is similar to a dot plot using e.g.
plotIndDist(), but is more suited to showing where a particular unit falls in an ordering.
COINr has yet more tools to plot data, but let’s leave it at that for the moment. Other tools will be introduced in other chapters.
4.4 Statistics and analysis
Aside from plots, COINr gives a fairly detailed statistical analysis of initial indicator data. The function
getStats() returns a series of statistics which can be aimed at any of the data sets in the
.$Data folder. You can also specify if you want the output to be returned back to the COIN, or to a separate list.
# get stats <- getStats(ASEM, dset = "Raw", out2 = "COIN") ASEM ## Number of collinear indicators = 3 ## Number of signficant negative indicator correlations = 322 ## Number of indicators with high denominator correlations = 7 # display in table using Reactable # (note the use of helper function roundDF() to round the values to a sensible number of decimals) $Analysis$Raw$StatTable |> ASEMroundDF() |> ::reactable(resizable = TRUE, bordered = TRUE, reactablehighlight = TRUE, defaultPageSize = 10)
The columns of this table give all kind of information from max, min, standard deviation, etc, to the presence of outliers and amount of missing data.
Apart from the overall statistics for each indicator,
getStats also returns a few other things:
.$Outliers, which flags individual outlying points using the relation to the interquartile range
.$Correlations, which gives a correlation matrix between all indicators in the data set
.$DenomCorrelations, which gives the correlations between indicators and any denominators
Each of these aspects will be explained in more detail in [Multivariate anlaysis], so for the moment it is enough to mention that they exist.