Performs Principle Component Analysis (PCA) on a specified data set and subset of indicators or aggregation groups.
This function has two main outputs: the output(s) of `stats::prcomp()`

, and optionally the weights resulting from
the PCA. Therefore it can be used as an analysis tool and/or a weighting tool. For the weighting aspect, please
see the details below.

## Usage

```
get_PCA(
coin,
dset = "Raw",
iCodes = NULL,
Level = NULL,
by_groups = TRUE,
nowarnings = FALSE,
weights_to = NULL,
out2 = "list"
)
```

## Arguments

- coin
A coin

- dset
The name of the data set in

`.$Data`

to use.- iCodes
An optional character vector of indicator codes to subset the indicator data, passed to

`get_data()`

- Level
The aggregation level to take indicator data from. Integer from 1 (indicator level) to N (top aggregation level, typically the index).

- by_groups
If

`TRUE`

(default), performs PCA inside each aggregation group inside the specified level. If`FALSE`

, performs a single PCA over all indicators/aggregates in the specified level.- nowarnings
If

`FALSE`

(default), will give warnings where missing data are found. Set to`TRUE`

to suppress these warnings.- weights_to
A string to name the resulting set of weights. If this is specified, and

`out2 = "coin"`

, will write a new set of "PCA weights" to the`.$Meta$Weights`

list. This is experimental - see details. If`NULL`

, does not write any weights (default).- out2
If the input is a coin object, this controls where to send the output. If

`"coin"`

, it sends the results to the coin object, otherwise if`"list"`

, outputs to a separate list (default).

## Value

If `out2 = "coin"`

, results are appended to the coin object. Specifically:

A list is added to

`.$Analysis`

containing PCA weights (loadings) of the first principle component, and the output of stats::prcomp, for each aggregation group found in the targeted level.If

`weights_to`

is specified, a new set of PCA weights is added to`.$Meta$Weights`

If`out2 = "list"`

the same outputs are contained in a list.

## Details

PCA must be approached with care and an understanding of what is going on. First, let's consider the PCA excluding the weighting component. PCA takes a set of data consisting of variables (indicators) and observations. It then rotates the coordinate system such that in the new coordinate system, the first axis (called the first principal component (PC)) aligns with the direction of maximum variance of the data set. The amount of variance explained by the first PC, and by the next several PCs, can help to understand whether the data can be explained by simpler set of variables. PCA is often used for dimensionality reduction in modelling, for example.

In the context of composite indicators, PCA can be used first as an analysis tool. We can check for example, within an aggregation group, can the indicators mostly be explained by one PC? If so, this gives a little extra justification to aggregating the indicators because the information lost in aggregation will be less. We can also check this over the entire set of indicators.

The complications are in a composite indicator, the indicators are grouped and arranged into a hierarchy. This means
that when performing a PCA, we have to decide which level to perform it at, and which groupings to use, if any. The `get_PCA()`

function, using the `by_groups`

argument, allows to automatically apply PCA by group if this is required.

The output of `get_PCA()`

is a PCA object for each of the groups specified, which can then be examined using existing
tools in R, see `vignette("analysis")`

.

The other output of `get_PCA()`

is a set of "PCA weights" if the `weights_to`

argument is specified. Here we also need
to say some words of caution. First, what constitutes "PCA weights" in composite indicators is not very well-defined.
In COINr, a simple option is adopted. That is, the loadings of the first principal component are taken as the weights.
The logic here is that these loadings should maximise the explained variance - the implication being that if we use
these as weights in an aggregation, we should maximise the explained variance and hence the information passed from
the indicators to the aggregate value. This is a nice property in a composite indicator, where one of the aims is to
represent many indicators by single composite. See doi:10.1016/j.envsoft.2021.105208
for a
discussion on this.

But. The weights that result from PCA have a number of downsides. First, they can often include negative weights which can be hard to justify. Also PCA may arbitrarily flip the axes (since from a variance point of view the direction is not important). In the quest for maximum variance, PCA will also weight the strongest-correlating indicators the highest, which means that other indicators may be neglected. In short, it often results in a very unbalanced set of weights. Moreover, PCA can only be performed on one level at a time.

All these considerations point to the fact: while PCA as an analysis tool is well-established, please use PCA weights with care and understanding of what is going on.

This function replaces the now-defunct `getPCA()`

from COINr < v1.0.

## See also

stats::prcomp Principle component analysis

## Examples

```
# build example coin
coin <- build_example_coin(up_to = "new_coin", quietly = TRUE)
# PCA on "Sust" group of indicators
l_pca <- get_PCA(coin, dset = "Raw", iCodes = "Sust",
out2 = "list", nowarnings = TRUE)
# Summary of results for one of the sub-groups
summary(l_pca$PCAresults$Social$PCAres)
#> Importance of components:
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
#> Standard deviation 2.2042 1.1256 0.9788 0.78834 0.77153 0.56836 0.42463
#> Proportion of Variance 0.5398 0.1408 0.1065 0.06905 0.06614 0.03589 0.02003
#> Cumulative Proportion 0.5398 0.6806 0.7871 0.85611 0.92225 0.95814 0.97817
#> PC8 PC9
#> Standard deviation 0.36068 0.25760
#> Proportion of Variance 0.01445 0.00737
#> Cumulative Proportion 0.99263 1.00000
```