Given a coin and a specified data set (dset
), returns a table of statistics with entries for each column.
Usage
# S3 method for class 'coin'
get_stats(
x,
dset,
t_skew = 2,
t_kurt = 3.5,
t_avail = 0.65,
t_zero = 0.5,
t_unq = 0.5,
nsignif = 3,
out2 = "df",
...
)
Arguments
- x
A coin
- dset
A data set present in
.$Data
- t_skew
Absolute skewness threshold. See details.
- t_kurt
Kurtosis threshold. See details.
- t_avail
Data availability threshold. See details.
- t_zero
A threshold between 0 and 1 for flagging indicators with high proportion of zeroes. See details.
- t_unq
A threshold between 0 and 1 for flagging indicators with low proportion of unique values. See details.plot
- nsignif
Number of significant figures to round the output table to.
- out2
Either
"df"
(default) to output a data frame of indicator statistics, or "coin
" to output an updated coin with the data frame attached under.$Analysis
.- ...
arguments passed to or from other methods.
Details
The statistics (columns in the output table) are as follows (entries correspond to each column):
Min
: the minimumMax
: the maximumMean
: the (arirthmetic) meanMedian
: the medianStd
: the standard deviationSkew
: the skewKurt
: the kurtosisN.Avail
: the number of non-NA
valuesN.NonZero
: the number of non-zero valuesN.Unique
: the number of unique valuesFrc.Avail
: the fraction of non-NA
valuesFrc.NonZero
: the fraction of non-zero valuesFrc.Unique
: the fraction of unique valuesFlag.Avail
: a data availability flag - columns withFrc.Avail < t_avail
will be flagged as"LOW"
, else"ok"
.Flag.NonZero
: a flag for columns with a high proportion of zeros. Any columns withFrc.NonZero < t_zero
are flagged as"LOW"
, otherwise"ok"
.Flag.Unique
: a unique value flag - any columns withFrc.Unique < t_unq
are flagged as"LOW"
, otherwise"ok"
.Flag.SkewKurt
: a skew and kurtosis flag which is an indication of possible outliers. Any columns withabs(Skew) > t_skew
ANDKurt > t_kurt
are flagged as"OUT"
, otherwise"ok"
.
The aim of this table, among other things, is to check the basic statistics of each column/indicator, and identify
any possible issues for each indicator. For example, low data availability, having a high proportion of zeros and/or
a low proportion of unique values. Further, the combination of skew and kurtosis (i.e. the Flag.SkewKurt
column)
is a simple test for possible outliers, which may require treatment using Treat()
.
The table can be returned either to the coin or as a standalone data frame - see out2
.
See also vignette("analysis")
.
Examples
# build example coin
coin <- build_example_coin(up_to = "new_coin", quietly = TRUE)
# get table of indicator statistics for raw data set
get_stats(coin, dset = "Raw", out2 = "df")
#> iCode Min Max Mean Median Std Skew Kurt
#> 1 LPI 2.07000 4.23 3.41e+00 3.420 5.38e-01 -0.3040 -0.6570
#> 2 Flights 0.99000 211.00 3.86e+01 25.600 4.67e+01 2.1000 4.5100
#> 3 Ship 0.00000 21.20 1.20e+01 12.700 6.84e+00 -0.5760 -0.6810
#> 4 Bord 0.00000 122.00 2.36e+01 18.000 2.48e+01 2.1500 5.7900
#> 5 Elec 0.00000 110.00 1.62e+01 6.910 2.27e+01 2.2300 5.7900
#> 6 Gas 0.00841 94.80 1.00e+01 1.140 1.75e+01 2.8300 10.3000
#> 7 ConSpeed 5.50000 28.60 1.43e+01 14.600 5.11e+00 0.4620 0.1870
#> 8 Cov4G 0.00000 100.00 7.56e+01 91.000 3.21e+01 -1.3700 0.5420
#> 9 Goods 7.23000 1920.00 2.90e+02 143.000 3.88e+02 2.6500 8.2700
#> 10 Services 1.38000 657.00 1.29e+02 54.100 1.61e+02 1.7000 2.3800
#> 11 FDI 0.13000 75.60 1.25e+01 6.100 1.58e+01 2.1000 4.8900
#> 12 PRemit 0.16600 30.20 6.56e+00 4.690 7.37e+00 1.8100 2.9400
#> 13 ForPort 0.00189 10600.00 1.58e+03 259.000 2.59e+03 2.0200 3.3000
#> 14 Embs 28.00000 100.00 7.09e+01 75.000 2.01e+01 -0.3680 -0.9930
#> 15 IGOs 82.00000 329.00 1.98e+02 197.000 5.86e+01 -0.0635 -0.2490
#> 16 UNVote 35.80000 43.20 4.09e+01 42.500 2.38e+00 -0.6750 -1.2000
#> 17 CostImpEx 0.00000 992.00 1.25e+02 45.000 1.82e+02 2.6900 9.8400
#> 18 Tariff 0.00000 10.50 2.46e+00 1.600 2.38e+00 2.4500 5.4100
#> 19 TBTs 1.00000 1750.00 7.95e+02 1140.000 5.50e+02 -0.3960 -1.5500
#> 20 TIRcon 0.00000 1.00 7.45e-01 1.000 4.40e-01 -1.1600 -0.6850
#> 21 RTAs 1.00000 46.00 2.39e+01 30.000 9.38e+00 -0.6780 0.1330
#> 22 Visa 1.00000 92.00 6.92e+01 79.000 2.22e+01 -1.7600 2.1700
#> 23 StMob 1.77000 445.00 6.41e+01 33.200 8.66e+01 2.7300 8.3100
#> 24 Research 175.00000 96300.00 1.63e+04 7730.000 2.31e+04 2.2200 4.8100
#> 25 Pat 0.30000 2770.00 3.34e+02 113.000 5.27e+02 2.8500 10.1000
#> 26 CultServ 0.00266 9.57 1.45e+00 0.578 2.09e+00 2.4700 6.3900
#> 27 CultGood 0.04600 74.50 1.00e+01 3.300 1.50e+01 2.6100 7.7400
#> 28 Tourist 0.12500 82.60 1.51e+01 9.200 1.84e+01 2.1900 4.9000
#> 29 MigStock 0.08170 10.90 2.48e+00 1.400 2.53e+00 1.5400 2.0600
#> 30 Lang 0.01880 21.50 9.31e+00 9.100 6.35e+00 0.0934 -1.0800
#> 31 Renew 0.01490 64.90 2.27e+01 17.200 1.72e+01 0.8220 -0.0661
#> 32 PrimEner 53.50000 192.00 1.06e+02 96.600 3.54e+01 0.7500 -0.1470
#> 33 CO2 0.29700 22.10 6.71e+00 5.940 4.55e+00 1.1100 1.8300
#> 34 MatCon 2.58000 38.40 1.60e+01 15.000 8.21e+00 0.7590 0.4240
#> 35 Forest 0.32600 31.80 5.95e+00 4.980 5.63e+00 2.7100 9.3500
#> 36 Poverty 0.00000 22.70 2.44e+00 0.400 5.51e+00 2.9200 7.8800
#> 37 Palma 0.88000 2.62 1.34e+00 1.250 3.75e-01 1.3300 2.3000
#> 38 TertGrad 2.01000 37.60 2.19e+01 23.300 8.52e+00 -0.3830 -0.3030
#> 39 FreePress 8.00000 87.00 3.84e+01 28.000 2.44e+01 0.7540 -0.7800
#> 40 TolMin 1.10000 9.80 5.39e+00 5.100 2.15e+00 0.0633 -0.6600
#> 41 NGOs 0.00000 1820.00 2.20e+02 66.000 3.87e+02 2.7900 7.9200
#> 42 CPI 21.00000 89.00 5.74e+01 57.000 1.98e+01 0.0203 -1.2500
#> 43 FemLab 0.29700 1.03 8.18e-01 0.863 1.41e-01 -2.0600 5.0200
#> 44 WomParl 4.86000 43.60 2.44e+01 23.700 9.72e+00 0.0859 -0.7890
#> 45 PubDebt 2.81000 248.00 6.30e+01 52.100 4.37e+01 2.0000 5.9700
#> 46 PrivDebt 19.70000 421.00 1.40e+02 128.000 8.49e+01 0.9570 1.4300
#> 47 GDPGrow -0.67700 7.58 3.22e+00 3.530 2.04e+00 -0.0516 -0.9300
#> 48 RDExp 0.08470 4.23 1.50e+00 1.290 1.02e+00 0.6100 -0.2760
#> 49 NEET 0.60000 42.10 1.26e+01 10.900 7.74e+00 1.6000 3.6300
#> N.Avail N.NonZero N.Unique N.Same Frc.Avail Frc.NonZero Frc.Unique Frc.Same
#> 1 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 2 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 3 51 42 43 9 1.000 0.824 0.8430 0.1760
#> 4 51 45 30 6 1.000 0.882 0.5880 0.1180
#> 5 51 47 46 4 1.000 0.922 0.9020 0.0784
#> 6 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 7 43 43 40 2 0.843 1.000 0.9300 0.0465
#> 8 51 49 37 4 1.000 0.961 0.7250 0.0784
#> 9 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 10 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 11 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 12 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 13 48 48 48 1 0.941 1.000 1.0000 0.0208
#> 14 51 51 33 4 1.000 1.000 0.6470 0.0784
#> 15 51 51 46 3 1.000 1.000 0.9020 0.0588
#> 16 51 51 49 3 1.000 1.000 0.9610 0.0588
#> 17 51 33 29 18 1.000 0.647 0.5690 0.3530
#> 18 48 46 20 28 0.941 0.958 0.4170 0.5830
#> 19 50 50 42 4 0.980 1.000 0.8400 0.0800
#> 20 51 38 2 38 1.000 0.745 0.0392 0.7450
#> 21 51 51 14 28 1.000 1.000 0.2750 0.5490
#> 22 51 51 22 11 1.000 1.000 0.4310 0.2160
#> 23 51 51 50 2 1.000 1.000 0.9800 0.0392
#> 24 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 25 45 45 45 1 0.882 1.000 1.0000 0.0222
#> 26 47 47 47 1 0.922 1.000 1.0000 0.0213
#> 27 44 44 44 1 0.863 1.000 1.0000 0.0227
#> 28 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 29 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 30 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 31 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 32 50 50 50 1 0.980 1.000 1.0000 0.0200
#> 33 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 34 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 35 51 51 51 1 1.000 1.000 1.0000 0.0196
#> 36 44 30 17 14 0.863 0.682 0.3860 0.3180
#> 37 45 45 35 3 0.882 1.000 0.7780 0.0667
#> 38 44 44 44 1 0.863 1.000 1.0000 0.0227
#> 39 51 51 38 3 1.000 1.000 0.7450 0.0588
#> 40 51 51 42 3 1.000 1.000 0.8240 0.0588
#> 41 51 50 47 3 1.000 0.980 0.9220 0.0588
#> 42 51 51 39 3 1.000 1.000 0.7650 0.0588
#> 43 49 49 49 1 0.961 1.000 1.0000 0.0204
#> 44 51 51 49 2 1.000 1.000 0.9610 0.0392
#> 45 50 50 50 1 0.980 1.000 1.0000 0.0200
#> 46 50 50 50 1 0.980 1.000 1.0000 0.0200
#> 47 51 49 50 2 1.000 0.961 0.9800 0.0392
#> 48 47 47 47 1 0.922 1.000 1.0000 0.0213
#> 49 49 49 40 3 0.961 1.000 0.8160 0.0612
#> Flag.Avail Flag.NonZero Flag.Unique Flag.SkewKurt
#> 1 ok ok ok ok
#> 2 ok ok ok OUT
#> 3 ok ok ok ok
#> 4 ok ok ok OUT
#> 5 ok ok ok OUT
#> 6 ok ok ok OUT
#> 7 ok ok ok ok
#> 8 ok ok ok ok
#> 9 ok ok ok OUT
#> 10 ok ok ok ok
#> 11 ok ok ok OUT
#> 12 ok ok ok ok
#> 13 ok ok ok ok
#> 14 ok ok ok ok
#> 15 ok ok ok ok
#> 16 ok ok ok ok
#> 17 ok ok ok OUT
#> 18 ok ok LOW OUT
#> 19 ok ok ok ok
#> 20 ok ok LOW ok
#> 21 ok ok LOW ok
#> 22 ok ok LOW ok
#> 23 ok ok ok OUT
#> 24 ok ok ok OUT
#> 25 ok ok ok OUT
#> 26 ok ok ok OUT
#> 27 ok ok ok OUT
#> 28 ok ok ok OUT
#> 29 ok ok ok ok
#> 30 ok ok ok ok
#> 31 ok ok ok ok
#> 32 ok ok ok ok
#> 33 ok ok ok ok
#> 34 ok ok ok ok
#> 35 ok ok ok OUT
#> 36 ok ok LOW OUT
#> 37 ok ok ok ok
#> 38 ok ok ok ok
#> 39 ok ok ok ok
#> 40 ok ok ok ok
#> 41 ok ok ok OUT
#> 42 ok ok ok ok
#> 43 ok ok ok OUT
#> 44 ok ok ok ok
#> 45 ok ok ok ok
#> 46 ok ok ok ok
#> 47 ok ok ok ok
#> 48 ok ok ok ok
#> 49 ok ok ok ok