Fisher's linear discriminant(Linear discriminant analysis, LDA)

MLAI/DimensionalityReduction 2019. 10. 5. 22:25

1. Overview

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistic, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

2. Description

2.1 between-class scatter

$$m_{i}=\frac{1}{N}\sum_{\mathbf{x}\in \omega _{i}}\mathbf{x}$$

$$\left.\begin{matrix}\bar{m}_{i}=\frac{1}{N}\sum_{\mathbf{y}\in \omega _{i}}\mathbf{y}
\\
\\ =\frac{1}{N}\sum_{\mathbf{x}\in \omega _{i}}\mathbf{w^{T}x}
\\
\\ =\mathbf{w^{T}m_{i}}

\end{matrix}\right\}$$

between-class scatter = $\left | \bar{m_{1}} - \bar{m_{2}} \right |=\left | \boldsymbol{w^{T}m_{1}} - \boldsymbol{w^{T}m_{2}} \right |=\left | \mathbf{w^{T}(m_{1}-m_{2})} \right |$

2.2 within-class scatter

$$\bar{s_{i}}^{2}=\sum_{y\in \omega _{i}}(y-\bar{m}_{i})^{2}$$

within-class scatter = $\bar{s_{1}}^{2}+\bar{s_{2}}^{2}$

2.3 Object function $J(w)$

$$J(w)=\frac{\left | m_{1} - m_{1} \right |^{2}}{\bar{s}_{1}^{2}-\bar{s}_{2}^{2}}$$

$$J(w)=\frac{w^{T}S_{B}W}{w^{T}S_{W}w}$$

where $S_{B}=\sum_{c}(\mu_{c}-\bar{x})(\mu_{c}-\bar{x})^{T}$, $S_{W}=\sum_{c}\sum_{i\in c}(x_{i}-\mu_{c})(x_{i}-\mu_{c})^{T}$

Generalize to multi-class cases
Maximizing the ratio of Between-class variance over within-class variance of the projected data

2.4 between-class scatter matrix

$$S_{B}=\mathbf{(m_{1}-m_{2})(m_{1}-m_{2})^{T}}$$

2.5 within-class scatter matrix

$$S_{i}=\sum_{x\in \omega _{i}}\mathbf{(x-m_{i})(x-m_{i})^T}$$

$$S_{w}=S_{1}+S_{2}$$

where $S_{i}$ is the covariance matrix of class $\omega _{i}$ and $S_{w}$ is called the within-class scatter matrix

2.6 linear discriminant

To find the maximum of $J(w)$, we differentiate and equate to zero.

$$\frac{\partial J(\mathbf{w})}{\partial \mathbf{w}}=\frac{\partial }{\partial \mathbf{w}}\left ( \frac{\mathbf{w^{T}S_{B}w}}{\mathbf{w^{T}S_{W}w}} \right )=0$$

$$(\mathbf{w^{T}S_{W}w})\mathbf{S_{B}w}=(\mathbf{w^{T}S_{B}w})\mathbf{S_{W}w}$$

where $\mathbf{S_ {B}w}$ and $m _{1} - m _{2}$ is same direction. and $\mathbf{w^{T}S_{W}w}$ and $\mathbf{w^{T}S_{B}w}$ is scalar. Thus,

$$(\mathbf{w^{T}S_{W}w})\mathbf{S_{B}w}=(\mathbf{w^{T}S_{B}w})\mathbf{S_{W}w}$$

$$(\mathbf{w^{T}S_{W}w})\mathbf{S_{B}w}=(\mathbf{w^{T}S_{B}w})\alpha_{1} \mathbf{(m_{1}-m_{2})}$$

$$\mathbf{S_{B}w}=\frac{(\mathbf{w^{T}S_{W}w})}{(\mathbf{w^{T}S_{B}w})}\alpha_{1} \mathbf{(m_{1}-m_{2})}$$

$$\mathbf{S_{B}w}=\alpha_{2}\alpha_{1} \mathbf{(m_{1}-m_{2})}$$

$$\mathbf{w}=\alpha\mathbf{S_{B}^{-1}} \mathbf{(m_{1}-m_{2})}$$

3. Procedure

Compute the d-dimensional mean vectors for the different classes from the dataset
Compute the scatter matrices (in-between-class and within-class scatter matrix)
Compute the eigenvectors and corresponding eigenvalues for the scatter matrices
Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a $d\times k$ dimensional matrix W where every column represents an eigenvector
Use this d by k eigenvector matrix to transform the samples onto the new subspace. this can be summarized by the matrix multiplication: $Y=X\times W$ where X is a $n\times d$ dimensional matrix representing the n samples, and y are the transformed $x\times k$ dimensional samples in the new subspace.

4. Difference between PCA and LDA

The goal of LDA is to project a feature space onto a small subspace while maintaining the class discriminatory information. And we have both PCA and LDA as linear transformation techniques used for dimensionality reduction PCA is an unsupervised algorithm but LVA is supervised because of the relation to the dependent variable.

5. References

https://pdfs.semanticscholar.org/d690/3041ea762bad6ff62cc2ec6fba2eb802634a.pdf

https://en.wikipedia.org/wiki/Linear_discriminant_analysis

https://sebastianraschka.com/Articles/2014_python_lda.html

https://sthalles.github.io/fisher-linear-discriminant/

http://www.sci.utah.edu/~shireen/pdfs/tutorials/Elhabian_LDA09.pdf

https://www.cs.cmu.edu/~tom/10701_sp11/recitations/Recitation_11.pdf

'MLAI > DimensionalityReduction' 카테고리의 다른 글

Canonical Correlation Analysis (0)	2020.01.25
Difference PCA and Factor analysis (0)	2020.01.23
Feature selection (0)	2019.10.06
Principal component analysis(PCA) (0)	2019.10.05

ABOUT ME

Demyank's Tlog Demyank's Tlog

1. Overview

2. Description

2.1 between-class scatter

2.2 within-class scatter

2.3 Object function $J(w)$

2.4 between-class scatter matrix

2.5 within-class scatter matrix

2.6 linear discriminant

3. Procedure

4. Difference between PCA and LDA

5. References

'MLAI > DimensionalityReduction' 카테고리의 다른 글

티스토리툴바

ABOUT ME

1. Overview

2. Description

2.1 between-class scatter

2.2 within-class scatter

2.3 Object function $J(w)$

2.4 between-class scatter matrix

2.5 within-class scatter matrix

2.6 linear discriminant

3. Procedure

4. Difference between PCA and LDA

5. References

'MLAI > DimensionalityReduction' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바