Skip to contents
library(biplotEZ)
#> Welcome to biplotEZ! 
#> This package is used to construct biplots 
#> Run ?biplot or vignette() for more information
#> 
#> Attaching package: 'biplotEZ'
#> The following object is masked from 'package:stats':
#> 
#>     biplot

This vignette deals with biplots for separating classes. Topics discussed are

  • CVA (Canonical variate analysis) biplots
  • AoD (Analysis of Distance) biplots
  • Classification biplots

What is a CVA biplot

Consider a data matrix 𝐗:nΓ—p\mathbf{X}:n \times p containing data on nn objects and pp variables. In addition, a vector 𝐠:nΓ—1\mathbf{g}:n \times 1 contains information on class membership of each observation. Let GG indicate the total number of classes. CVA is closely related to linear discriminant anlaysis, in that the pp variables are transformed to pp new variables, called canonical variates, such that the classes are optimally separated in the canonical space. By optimally separated, we mean maximising the between class variance, relative to the within class variance. This can be formulated as follows:

Let 𝐆:nΓ—G\mathbf{G}:n \times G be an indicator matrix with gij=0g_{ij} = 0 unless observation ii belongs to class jj and then gij=1g_{ij} = 1. The matrix 𝐆′𝐆\mathbf{G'G} is a diagonal matrix containing the number of observations per class on the diagonal. We can form the matrix of class means 𝐗‾:GΓ—p=(𝐆′𝐆)βˆ’1𝐆′𝐗\bar{\mathbf{X}}:G \times p = (\mathbf{G'G})^{-1} \mathbf{G'X}. With the usual analysis of variance the total variance can be decomposed into a between class variance and within class variance:

𝐓=𝐁+𝐖 \mathbf{T} = \mathbf{B} + \mathbf{W}

𝐗′𝐗=𝐗‾′𝐂𝐗‾+𝐗′[πˆβˆ’π†(𝐆′𝐆)βˆ’πŸπ‚(𝐆′𝐆)βˆ’πŸπ†β€²]𝐗 \mathbf{X'X} = \mathbf{\bar{\mathbf{X}}'C \bar{\mathbf{X}}} + \mathbf{X' [I - G(G'G)^{-1}C(G'G)^{-1}G'] X}

The default choice for the centring matrix 𝐂=𝐆′𝐆\mathbf{C = G'G} leads to the simplification

𝐗′𝐗=𝐗‾′𝐆′𝐆𝐗‾+𝐗′[πˆβˆ’π†(𝐆′𝐆)βˆ’πŸπ†β€²]𝐗. \mathbf{X'X} = \mathbf{\bar{\mathbf{X}}'G'G \bar{\mathbf{X}}} + \mathbf{X' [I - G(G'G)^{-1}G'] X}.

Other options are 𝐂=𝐈\mathbf{C = I} and 𝐂=(𝐈Gβˆ’1GπŸπŸβ€²)\mathbf{C} = (\mathbf{I}_G - \frac{1}{G}\mathbf{11'}). To find the canonical variates we want to maximise the ratio

𝐦′𝐁𝐦𝐦′𝐖𝐦 \frac{\mathbf{m'Bm}}{\mathbf{m'Wm}}

subject to 𝐦′𝐖𝐦=1\mathbf{m'Wm} = 1. It can be shown that this leads to the following equivalent eigen equations:

π–βˆ’1𝐁𝐌=𝐌𝚲 \mathbf{W}^{-1}\mathbf{BM} = \mathbf{M \Lambda} \tag{1}

𝐁𝐌=π–πŒπš² \mathbf{BM} = \mathbf{WM \Lambda}

(π–βˆ’12ππ–βˆ’12)𝐌=(π–βˆ’12𝐌)𝚲 (\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}}) \mathbf{M} = (\mathbf{W}^{-\frac{1}{2}} \mathbf{M}) \mathbf{\Lambda}

with πŒβ€²ππŒ=𝚲\mathbf{M'BM}= \mathbf{\Lambda} and πŒβ€²π–πŒ=𝐈\mathbf{M'WM}= \mathbf{I}.

Since the matrix π–βˆ’12ππ–βˆ’12\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}} is symmetric and positive semi-definite the eigenvalues in the matrix 𝚲\mathbf{\Lambda} are positive and ordered. The rank of 𝐁=min(p,Gβˆ’1)\mathbf{B} = min(p, G-1) so that only the first rank(𝐁)rank(\mathbf{B}) eigenvalues are non-zero. We form the canonical variates with the transformation

π˜β€Ύ=π—β€ΎπŒ. \bar{\mathbf{Y}} = \bar{\mathbf{X}}\mathbf{M}.

To construct a 2D biplot, we plot the first two canonical variates 𝐙‾=π—β€ΎπŒπ‰2\bar{\mathbf{Z}} = \bar{\mathbf{X}}\mathbf{MJ}_2 where 𝐉2β€²=[𝐈2𝟎]\mathbf{J}_2' = \begin{bmatrix} \mathbf{I}_2 & \mathbf{0} \end{bmatrix}. We add the individual sample points with the same transformation

𝐙=π—πŒπ‰2 \mathbf{Z} = \mathbf{X}\mathbf{MJ}_2 where 𝐉2=[𝐈2𝟎]. \mathbf{J}_2 = \begin{bmatrix} \mathbf{I}_2\\ \mathbf{0} \end{bmatrix}. Interpolation of a new sample 𝐱*:pΓ—1\mathbf{x}^*:p \times 1 follows as 𝐳*β€²:2Γ—1=𝐱*β€²πŒπ‰2{\mathbf{z}^*}':2 \times 1 ={\mathbf{x}^*}' \mathbf{MJ}_2. Using the inverse transformation 𝐱′=π²β€²πŒβˆ’1\mathbf{x}' = \mathbf{y}'\mathbf{M}^{-1}, all the points that will predict ΞΌ\mu for variable jj will have the form

ΞΌ=π²β€²πŒβˆ’1𝐞j \mu = \mathbf{y}'\mathbf{M}^{-1} \mathbf{e}_j

where 𝐞j\mathbf{e}_j is a vector of zeros with a one in position jj. All the points in the 2D biplot that predict the value μ\mu will have

ΞΌ=[z1z20…0]πŒβˆ’1𝐞j \mu = \begin{bmatrix} z_1 & z_2 & 0 & \dots & 0\end{bmatrix}\mathbf{M}^{-1} \mathbf{e}_j

defining the prediction line as

ΞΌ=𝐳μ′𝐉2πŒβˆ’1𝐞j. \mu = \mathbf{z}_{\mu}' \mathbf{J}_2 \mathbf{M}^{-1} \mathbf{e}_j.

Writing 𝐑(j)=𝐉2πŒβˆ’1𝐞j\mathbf{h}_{(j)} = \mathbf{J}_2 \mathbf{M}^{-1} \mathbf{e}_j the construction of biplot axes is similar to the discussion in the biplotEZ vignette for PCA biplots. The direction of the axis is given by 𝐑(j)\mathbf{h}_{(j)}. To find the intersection of the prediction line with 𝐑(j)\mathbf{h}_{(j)} we note that 𝐳′(ΞΌ)𝐑(j)=βˆ₯𝐳(ΞΌ)βˆ₯2βˆ₯𝐑(j)βˆ₯2cos(𝐳(ΞΌ),𝐑(j))=βˆ₯𝐩βˆ₯2βˆ₯𝐑(j)βˆ₯2 \mathbf{z}'_{(\mu)}\mathbf{h}_{(j)} = \| \mathbf{z}_{(\mu)} \|^2 \| \mathbf{h}_{(j)} \|^2 cos(\mathbf{z}_{(\mu)},\mathbf{h}_{(j)}) = \| \mathbf{p} \|^2 \| \mathbf{h}_{(j)} \|^2 where 𝐩\mathbf{p} is the length of the orthogonal projection of 𝐳(ΞΌ)\mathbf{z}_{(\mu)} on 𝐑(j)\mathbf{h}_{(j)}.

Since 𝐩\mathbf{p} is along 𝐑(j)\mathbf{h}_{(j)} we can write 𝐩=c𝐑(j)\mathbf{p} = c\mathbf{h}_{(j)} and all points on the prediction line ΞΌ=𝐳′μ𝐑(j)\mu = \mathbf{z}'_{\mu}\mathbf{h}_{(j)} project on the same point cμ𝐑(j)c_{\mu}\mathbf{h}_{(j)}. We solve for cΞΌc_{\mu} from ΞΌ=𝐳′μ𝐑(j)=βˆ₯𝐩βˆ₯2βˆ₯𝐑(j)βˆ₯2=βˆ₯cμ𝐑(j)βˆ₯2βˆ₯𝐑(j)βˆ₯2 \mu = \mathbf{z}'_{\mu}\mathbf{h}_{(j)}=\| \mathbf{p} \|^2 \| \mathbf{h}_{(j)} \|^2 = \| c_{\mu}\mathbf{h}_{(j)} \|^2 \| \mathbf{h}_{(j)} \|^2

cΞΌ=μ𝐑(j)′𝐑(j). c_{\mu} = \frac{\mu}{\mathbf{h}_{(j)}'\mathbf{h}_{(j)}}. If we select β€˜nice’ scale markers Ο„1,Ο„2,β‹―Ο„k\tau_{1}, \tau_{2}, \cdots \tau_{k} for variable jj, then Ο„hβˆ’xβ€Ύj=ΞΌh\tau_{h}-\bar{x}_j = \mu_{h} and positions of these scale markers on 𝐑(j)\mathbf{h}_{(j)} are given by pΞΌ1,pΞΌ2,β‹―pΞΌkp_{\mu_{1}}, p_{\mu_{2}}, \cdots p_{\mu_{k}} with pΞΌh=cΞΌh𝐑(j)=ΞΌh𝐑(j)′𝐑(j)𝐑(j) p_{\mu_h} = c_{\mu_h}\mathbf{h}_{(j)} = \frac{\mu_h}{\mathbf{h}_{(j)}'\mathbf{h}_{(j)}}\mathbf{h}_{(j)}

=ΞΌh𝐞(j)β€²πŒβ€²βˆ’1π‰πŒβˆ’1𝐞(j)𝐉2πŒβˆ’1𝐞(j) = \frac{\mu_h}{\mathbf{e}_{(j)}' \mathbf{M'}^{-1} \mathbf{J} \mathbf{M}^{-1} \mathbf{e}_{(j)}}\ \mathbf{J}_2 \mathbf{M}^{-1} \mathbf{e}_{(j)}

with 𝐉=[𝐈2𝟎𝟎𝟎]. \mathbf{J} = \begin{bmatrix} \mathbf{I}_2 & \mathbf{0}\\ \mathbf{0} & \mathbf{0} \end{bmatrix}.

The function CVA()

To obtain a CVA biplot of the state.x77 data set, optimally separating the classes according to state.region we call

biplot(state.x77) |> 
  CVA(classes = state.region) |> 
  plot()

Fitting Ξ±\alpha-bags to the classes makes it easier to compare class overlap and separation. For a detailed discussion on Ξ±\alpha-bags, see the biplotEZ vignette.

biplot(state.x77) |> CVA(classes = state.region) |> alpha.bags() |> 
  legend.type (bags = TRUE) |> 
  plot()
#> Computing 0.95 -bag for Northeast 
#> Computing 0.95 -bag for South 
#> Computing 0.95 -bag for North Central 
#> Computing 0.95 -bag for West

The function means()

This function controls the aesthetics of the class means in the biplot. The function accepts as first argument an object of class biplot where the aesthetics should be applied. Let us first construct a CVA biplot of the state.x77 data with samples optimally separated according to state.division.

biplot(state.x77, scaled = TRUE) |> 
  CVA(classes = state.division) |> 
  legend.type(means = TRUE) |> 
  plot()

Instead of adding a legend, we can choose to label the class means. Furthermore, the colour of each class mean defaults to the colour of the samples. We wish to select a different colour and plotting character for the class means.

biplot(state.x77, scaled = TRUE) |> 
  CVA(classes = state.division) |> 
  means(label = TRUE, col = "olivedrab", pch = 15) |> 
  plot()

If we choose to only show the class means for the central states, the argument which is used either indicating the number(s) in the sequence of levels (which = 4:7), or as shown below, the levels themselves:

biplot(state.x77, scaled = TRUE) |> 
  CVA(classes = state.division) |> 
  means (which = c("West North Central", "West South Central", "East South Central", 
                     "East North Central"), label = TRUE) |>
  plot()

The size of the labels is controlled with label.cex which can be specified either as a single value (for all class means) or a vector indicating size values for each individual sample. The colour of the labels defaults to the colour(s) of the class means. However, individual label colours can be spesified with label.col, similar to label.cex as either a single value of a vector of length equal to the number of classes.

biplot(state.x77, scaled = TRUE) |> 
  CVA(classes = state.division) |> 
  means (col = "olivedrab", pch = 15, cex = 1.5,
         label = TRUE, label.col = c("blue","green","gold","cyan","magenta",
                                     "black","red","grey","purple")) |>
  plot()

We can also make use of the functionality of the ggrepel package to place the labels.

biplot(state.x77, scaled = TRUE) |> 
  CVA(classes = state.division) |> 
  samples (label = "ggrepel", label.cex=0.65) |> 
  means (label = "ggrepel", label.cex=0.8) |> plot()

The function classify()

Classification regions can be added to the CVA biplot with the function classify(). The argument classify.regions must be set equal to TRUE to render the regions in the plot. Other arguments such as col, opacity and borders allows to change the aesthetics of the regions.

biplot(state.x77, scaled = TRUE) |> 
  CVA(classes = state.division) |>
  classify(classify.regions = TRUE,opacity = 0.2) |> 
  plot()

The functions fit.measures() and summary()

There is a number of fit measures that are specific to CVA biplots. The measures are computed with the function fit.measures() and the results are displayed by the function summary().

Canonical variate analysis can be considered as a transformation of the original variables to the canonical space followed by constructing a PCA biplot of canonical variables. The matrix of class means 𝐗‾=(𝐆′𝐆)βˆ’1𝐆′𝐗\bar{\mathbf{X}} = (\mathbf{G'G})^{-1} \mathbf{G'X} is transformed to 𝐗‾𝐋\mathbf{\bar{X}L} where 𝐋\mathbf{L} is a non-singular matrix such that 𝐋𝐋′=π–βˆ’1\mathbf{LL'=W}^{-1}. Pricipal component analysis finds the orthogonal matrix 𝐕\mathbf{V} such that

(𝐋′𝐗‾′𝐂𝐗‾𝐋)𝐕=π•πš² \mathbf{(L'\bar{X}'C\bar{X}L)V=V \Lambda}

where 𝐌=𝐋𝐕\mathbf{M = LV} as defined in section 1. The predicted values for the class means is given by

𝐗‾̂=π—β€ΎπŒπ‰πŒβˆ’1. \mathbf{\hat{\bar{X}}} = \mathbf{\bar{X}MJ}\mathbf{M}^{-1}.

Overall quality of fit

Based on the two-step process described above, there are two measures of quality of fit. The quality of the approximation of the canonical variables 𝐗‾𝐋\mathbf{\bar{X}L} in the 22-dimensional display is given by

Quality(canonicalvariables)=tr(πš²π‰)tr(𝚲) Quality (canonical \: variables) = \frac{tr(\mathbf{\Lambda J})}{tr(\mathbf{\Lambda)}} and the quality of the approximation of the original variables 𝐗‾\mathbf{\bar{X}} in the 2D CVA biplot is given by

Quality(originalvariables)=tr(πš²π‰)tr(𝚲) Quality (original \: variables) = \frac{tr(\mathbf{\Lambda J})}{tr(\mathbf{\Lambda)}}

Adequacy of representation of variables

The adequacy with which each of the variables is represented in the biplot is given by the elementwise ratios

Adequacy=diag(πŒπ‰πŒβ€²)diag(πŒπŒβ€²). Adequacy = \frac{diag(\mathbf{MJM'})}{diag(\mathbf{MM'})}.

Predictivity

Between class predictivity

The axis and class mean predictivities are defined in terms of the weighted class means.

Axis predictivity

The elementwise ratios for the predictivity of each of the axes are given by

axispredictivity=diag(𝐗‾̂′𝐂𝐗‾̂)diag(𝐗‾′𝐂𝐗‾). axis \: predictivity = \frac{diag(\mathbf{\hat{\bar{X}}}'\mathbf{C\hat{\bar{X}}})}{diag(\mathbf{\bar{X}}'\mathbf{C\bar{X}})}.

Class predictivity

Similarly for each of the class means the elementwise ratio is computed from

classpredictivity=diag(𝐂12π—β€ΎΜ‚β€²π–βˆ’πŸπ—β€ΎΜ‚π‚12)diag(𝐂12π—β€Ύβ€²π–βˆ’πŸπ—β€Ύπ‚12). class \: predictivity = \frac{diag(\mathbf{C}^{\frac{1}{2}}\mathbf{\hat{\bar{X}}}'\mathbf{W^{-1}}\mathbf{\hat{\bar{X}}}\mathbf{C}^{\frac{1}{2}})}{diag(\mathbf{C}^{\frac{1}{2}}\mathbf{\bar{X}}'\mathbf{W^{-1}}\mathbf{\bar{X}}\mathbf{C}^{\frac{1}{2}})}.

Within class predictivity

We define the matrix of samples as deviations from their class means as

(πˆβˆ’π‡)𝐗=(𝐈nβˆ’π†(𝐆′𝐆)βˆ’1𝐆′)𝐗 (\mathbf{I-H})\mathbf{X}=(\mathbf{I}_n-\mathbf{G}(\mathbf{G'G})^{-1}\mathbf{G}')\mathbf{X}

where 𝐇=𝐆(𝐆′𝐆)βˆ’1𝐆′\mathbf{H} = \mathbf{G}(\mathbf{G'G})^{-1}\mathbf{G}'.

Within class axis predictivity

The within class axis predictivity is computed as the elementwise ratios

withinclassaxispredictivity=diag(𝐗̂′(πˆβˆ’π‡)𝐗̂)diag(𝐗′(πˆβˆ’π‡)𝐗). within \: class \: axis \: predictivity = \frac{diag(\mathbf{\hat{X}}'(\mathbf{I-H)\hat{X}})}{diag(\mathbf{X}'(\mathbf{I-H)X})}.

Within class sample predictivity

Unlike PCA biplots, sample predictivity for CVA biplots are computed for the observations expressed as deviations from their class means. The elementwise ratios is obtained from

withinclassaxispredictivity=diag((πˆβˆ’π‡)π—Μ‚π–βˆ’1𝐗̂′(πˆβˆ’π‡))diag((πˆβˆ’π‡)π—π–βˆ’1𝐗′(πˆβˆ’π‡)). within \: class \: axis \: predictivity = \frac{diag(\mathbf{(I-H)\hat{X}}\mathbf{W}^{-1}\mathbf{\hat{X}'(I-H)})}{diag(\mathbf{(I-H)X}\mathbf{W}^{-1}\mathbf{X'(I-H)})}. To display the fit measures, we create a biplot object with the measures added by the function fit.measures() and call summary().

obj <- biplot(state.x77, scaled = TRUE) |> 
       CVA(classes = state.division) |> 
       fit.measures() |>
       plot()

summary (obj)
#> Object of class biplot, based on 50 samples and 8 variables.
#> 8 numeric variables.
#> 9 classes: New England Middle Atlantic South Atlantic East South Central West South Central East North Central West North Central Mountain Pacific 
#> 
#> Quality of fit of canonical variables in 2 dimension(s) = 70.7% 
#> Quality of fit of original variables in 2 dimension(s) = 70.5% 
#> Adequacy of variables in 2 dimension(s):
#> Population     Income Illiteracy   Life Exp     Murder    HS Grad      Frost 
#> 0.41716176 0.15621549 0.16136381 0.09759664 0.19426796 0.55332679 0.50497634 
#>       Area 
#> 0.40661470 
#> Axis predictivity in 2 dimension(s):
#> Population     Income Illiteracy   Life Exp     Murder    HS Grad      Frost 
#>  0.1859124  0.4019427  0.8195756  0.6925389  0.7685373  0.9506355  0.7819324 
#>       Area 
#>  0.8458143 
#> Class predictivity in 2 dimension(s):
#>        New England    Middle Atlantic     South Atlantic East South Central 
#>          0.7922047          0.6570417          0.8191791          0.8777759 
#> West South Central East North Central West North Central           Mountain 
#>          0.7416085          0.6370315          0.3265978          0.6825966 
#>            Pacific 
#>          0.6700194 
#> Within class axis predictivity in 2 dimension(s):
#> Population     Income Illiteracy   Life Exp     Murder    HS Grad      Frost 
#> 0.04212318 0.09357501 0.25675620 0.19900223 0.29474972 0.75215233 0.31027358 
#>       Area 
#> 0.12741853 
#> Within class sample predictivity in 2 dimension(s):
#>        Alabama         Alaska        Arizona       Arkansas     California 
#>    0.722548912    0.163442379    0.333341120    0.268976273    0.229139828 
#>       Colorado    Connecticut       Delaware        Florida        Georgia 
#>    0.264963758    0.082284385    0.593415987    0.461070888    0.636531435 
#>         Hawaii          Idaho       Illinois        Indiana           Iowa 
#>    0.015640188    0.113711473    0.338612599    0.389208196    0.507060148 
#>         Kansas       Kentucky      Louisiana          Maine       Maryland 
#>    0.784831952    0.314119027    0.078465054    0.008388471    0.306141816 
#>  Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
#>    0.076563044    0.218470793    0.645446212    0.046129058    0.710971640 
#>        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
#>    0.086279776    0.810374638    0.090490164    0.298187909    0.003496353 
#>     New Mexico       New York North Carolina   North Dakota           Ohio 
#>    0.007134343    0.024268121    0.422776032    0.446240464    0.277262145 
#>       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
#>    0.450104680    0.108636860    0.033945796    0.415029328    0.261568299 
#>   South Dakota      Tennessee          Texas           Utah        Vermont 
#>    0.134881180    0.247921823    0.110537439    0.500454605    0.159941068 
#>       Virginia     Washington  West Virginia      Wisconsin        Wyoming 
#>    0.310439564    0.030877305    0.066303623    0.295499472    0.474458397

The call to biplot(), CVA() and fit.measures() is required to (a) create an object of class biplot, (b) extend the object to class CVA and (c) compute the fit measures. The call to the function plot() is optional. It is further possible to select which fit measures to display in the summary() function where all measures default to TRUE.

obj <- biplot(state.x77, scaled = TRUE) |> 
       CVA(classes = state.region) |> 
       fit.measures()
summary (obj, adequacy = FALSE, within.class.axis.predictivity = FALSE,
         within.class.sample.predictivity = FALSE)
#> Object of class biplot, based on 50 samples and 8 variables.
#> 8 numeric variables.
#> 4 classes: Northeast South North Central West 
#> 
#> Quality of fit of canonical variables in 2 dimension(s) = 91.9% 
#> Quality of fit of original variables in 2 dimension(s) = 95.3% 
#> Axis predictivity in 2 dimension(s):
#> Population     Income Illiteracy   Life Exp     Murder    HS Grad      Frost 
#>  0.9873763  0.9848608  0.8757913  0.9050208  0.9955088  0.9970346  0.9558192 
#>       Area 
#>  0.9344651 
#> Class predictivity in 2 dimension(s):
#>     Northeast         South North Central          West 
#>     0.8031465     0.9985089     0.6449906     0.9988469

Additional CVA dimensions

It was mentioned that the eigen equation (1) has min(p,Gβˆ’1)min(p, G-1) non-zero eigenvalues. This implies that the CVA biplot for G=2G=2 groups, reduces to a single dimension. If we write

𝐌=[𝐦1𝐌*] \mathbf{M} = \begin{bmatrix} \mathbf{m}_1 & \mathbf{M}^* \end{bmatrix} the columns of 𝐌*\mathbf{M}^* forms a basis for the orthogonal complement of the canonical space defined by 𝐦\mathbf{m}_1. The argument low.dim determines how to uniquely define the second and third dimensions. By default low.dim = "sample.opt" which selects the dimensions by minimising total squared reconstruction error for samples.

The representation of the canonical variates 𝐙‾=𝐗‾𝐦1\bar{\mathbf{Z}} = \bar{\mathbf{X}}\mathbf{m}_1 are exact in the first dimension, but not the representation of the individual samples 𝐙=𝐗𝐦1{\mathbf{Z}} = {\mathbf{X}}\mathbf{m}_1. If we define 𝐗̂=π—πŒπ‰πŒβˆ’1\mathbf{\hat{X}} = \mathbf{XMJ}\mathbf{M}^{-1} with 𝐉\mathbf{J} a square matrix of zeros except for a 11 in the first diagonal position, then the total square reconstruction error for samples is given by

TSRES=tr(π—βˆ’π—Μ‚)β€²(π—βˆ’π—Μ‚). TSRES = tr{(\mathbf{X}-\mathbf{\hat{X}})'(\mathbf{X}-\mathbf{\hat{X}})}. Define πŒβˆ’1=[𝐌(1):(Gβˆ’1)Γ—p𝐌(2):(pβˆ’G+1)Γ—p] \mathbf{M}^{-1} = \begin{bmatrix} \mathbf{M}^{(1)}:(G-1) \times p \\ \mathbf{M}^{(2)}: (p-G+1) \times p \end{bmatrix}

then TSRESTSRES is minimised when

𝐌opt=[𝐌1𝐌*𝐕] \mathbf{M}^{opt} = \begin{bmatrix} \mathbf{M}_1 & \mathbf{M}^*\mathbf{V} \end{bmatrix}

where where 𝐕\mathbf{V} is the matrix of right singular vectors of 𝐌(2)𝐌(2)β€²\mathbf{M}^{(2)}\mathbf{M}^{(2)'}.

state.2group <- ifelse(state.division == "New England" | 
                       state.division == "Middle Atlantic"  |
                       state.division == "South Atlantic" |
                       state.division == "Pacific",
                       "Coastal", "Central")
biplot (state.x77) |> CVA (state.2group) |> legend.type(means=TRUE) |> plot()
#> Warning in CVA.biplot(biplot(state.x77), state.2group): The dimension of the
#> canonical space < dim.biplot sample.opt method used for additional
#> dimension(s).

le Roux and Gardner-Lubbe (2024) discuss an alternative method for obtaining additional dimensions. When assuming underlying normal distributions, the Bhattacharyya distance can be optimised. This method is specific to the two class case and cannot be utilised to find a third dimension in a 3D CVA biplot with three classes.

biplot (state.x77) |> CVA (state.2group, low.dim="Bha") |> legend.type(means=TRUE) |> plot()
#> Warning in CVA.biplot(biplot(state.x77), state.2group, low.dim = "Bha"): The
#> dimension of the canonical space < dim.biplot Bhattacharyya.dist method used
#> for additional dimension(s).

Analysis of Distance (AoD)

Similar to the variance decomposition in CVA, analysis of distance decomposes the total sum of squared distances into a sum of squared distances between class means component and a sum of squared distances within classes component.

Consider any Euclidean embeddable distance metric ψij=ψ(𝐱i,𝐱j)\psi_{ij}=\psi(\mathbf{x}_i,\mathbf{x}_j). For a Euclidean embeddable metric it is possible to find high dimensional coordinates 𝐲i\mathbf{y}_i and 𝐲j\mathbf{y}_j such that the Euclidean distance between 𝐲i\mathbf{y}_i and 𝐲j\mathbf{y}_j is equal to ψij\psi_{ij}. Let the matrix πšΏΜƒ\mathbf{\tilde\Psi} contain the values βˆ’12ψij2-\frac{1}2{}\psi_{ij}^2 and similarly πš«Μƒ\mathbf{\tilde\Delta} the values βˆ’12Ξ΄hk2-\frac{1}2{}\delta_{hk}^2 where Ξ΄hk\delta_{hk} represent the distance between class means hh and kk.

𝐓=𝐁+𝐖 \mathbf{T} = \mathbf{B} + \mathbf{W}

πŸβ€²πšΏΜƒπŸ=π§β€²πš«Μƒπ§+βˆ‘k=1Gnnk𝐠kβ€²πšΏΜƒπ k \mathbf{1'\tilde\Psi1} = \mathbf{n'\tilde\Delta n} + \sum_{k=1}^{G} \frac{n}{n_k} \mathbf{g}_k'\mathbf{\tilde\Psi}\mathbf{g}_k where 𝐧=(𝐆′𝐆)𝟏\mathbf{n}=\mathbf{(G'G)1}. Thus, AoD differs from CVA in allowing any Euclidean embeddable measure of inter-class distance. As with CVA, these distances may be represented in maps with point representing the class means, supplemented by additional points representing the within-group variation. Principal coordinate analysis is performed, only on the GΓ—GG \times G matrix πš«Μƒ\mathbf{\tilde\Delta}.

biplot(state.x77, scaled = TRUE) |> AoD(classes = state.region) |> plot()

By default linear regression biplot axes are fitted to the plot. Alternatively, spline axes can be constructed.

biplot(state.x77, scaled = TRUE) |> AoD(classes = state.region, axes = "splines") |> plot()

#> Calculating spline axis for variable 1 
#> Calculating spline axis for variable 2 
#> Calculating spline axis for variable 3 
#> Calculating spline axis for variable 4 
#> Calculating spline axis for variable 5 
#> Calculating spline axis for variable 6 
#> Calculating spline axis for variable 7 
#> Calculating spline axis for variable 8

As an illustration of a Euclidean embeddable distance metric, other than Euclidean distance itself, we can construct an AoD biplot with the square root of the Manhattan distance.

biplot(state.x77, scaled = TRUE) |> 
  AoD(classes = state.region, axes = "splines", dist.func=sqrtManhattan) |> plot()

#> Calculating spline axis for variable 1 
#> Calculating spline axis for variable 2 
#> Calculating spline axis for variable 3 
#> Calculating spline axis for variable 4 
#> Calculating spline axis for variable 5 
#> Calculating spline axis for variable 6 
#> Calculating spline axis for variable 7 
#> Calculating spline axis for variable 8

References

le Roux, N. J, and S. Gardner-Lubbe. 2024. β€œA Two-Group Canonical Variate Analysis Biplot for an Optimal Display of Bothmeans and Cases.” Advances in Data Analysis and Classification.