ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Fisher's linear discriminant(Linear discriminant analysis, LDA)
    MLAI/DimensionalityReduction 2019. 10. 5. 22:25

    1. Overview

    Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistic, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

    2. Description

    2.1 between-class scatter

    $$m_{i}=\frac{1}{N}\sum_{\mathbf{x}\in \omega _{i}}\mathbf{x}$$

     

    $$\left.\begin{matrix}\bar{m}_{i}=\frac{1}{N}\sum_{\mathbf{y}\in \omega _{i}}\mathbf{y}
    \\ 
    \\ =\frac{1}{N}\sum_{\mathbf{x}\in \omega _{i}}\mathbf{w^{T}x}
    \\
    \\ =\mathbf{w^{T}m_{i}}

    \end{matrix}\right\}$$

     

    between-class scatter = $\left | \bar{m_{1}} - \bar{m_{2}} \right |=\left | \boldsymbol{w^{T}m_{1}} - \boldsymbol{w^{T}m_{2}} \right |=\left | \mathbf{w^{T}(m_{1}-m_{2})} \right |$

    2.2 within-class scatter

    $$\bar{s_{i}}^{2}=\sum_{y\in \omega _{i}}(y-\bar{m}_{i})^{2}$$

    within-class scatter = $\bar{s_{1}}^{2}+\bar{s_{2}}^{2}$

     

    2.3 Object function $J(w)$

    $$J(w)=\frac{\left | m_{1} - m_{1} \right |^{2}}{\bar{s}_{1}^{2}-\bar{s}_{2}^{2}}$$

    $$J(w)=\frac{w^{T}S_{B}W}{w^{T}S_{W}w}$$

    where $S_{B}=\sum_{c}(\mu_{c}-\bar{x})(\mu_{c}-\bar{x})^{T}$, $S_{W}=\sum_{c}\sum_{i\in c}(x_{i}-\mu_{c})(x_{i}-\mu_{c})^{T}$

    • Generalize to multi-class cases
    • Maximizing the ratio of Between-class variance over within-class variance of the projected data

    2.4 between-class scatter matrix

    $$S_{B}=\mathbf{(m_{1}-m_{2})(m_{1}-m_{2})^{T}}$$

    2.5 within-class scatter matrix

    $$S_{i}=\sum_{x\in \omega _{i}}\mathbf{(x-m_{i})(x-m_{i})^T}$$

    $$S_{w}=S_{1}+S_{2}$$

    where $S_{i}$ is the covariance matrix of class $\omega _{i}$ and $S_{w}$ is called the within-class scatter matrix

    2.6 linear discriminant

    To find the maximum of $J(w)$, we differentiate and equate to zero.

    $$\frac{\partial J(\mathbf{w})}{\partial \mathbf{w}}=\frac{\partial }{\partial \mathbf{w}}\left ( \frac{\mathbf{w^{T}S_{B}w}}{\mathbf{w^{T}S_{W}w}} \right )=0$$

    $$(\mathbf{w^{T}S_{W}w})\mathbf{S_{B}w}=(\mathbf{w^{T}S_{B}w})\mathbf{S_{W}w}$$

    where $\mathbf{S_ {B}w}$ and $m _{1} - m _{2}$ is same direction. and $\mathbf{w^{T}S_{W}w}$ and $\mathbf{w^{T}S_{B}w}$ is scalar. Thus,

    $$(\mathbf{w^{T}S_{W}w})\mathbf{S_{B}w}=(\mathbf{w^{T}S_{B}w})\mathbf{S_{W}w}$$

    $$(\mathbf{w^{T}S_{W}w})\mathbf{S_{B}w}=(\mathbf{w^{T}S_{B}w})\alpha_{1} \mathbf{(m_{1}-m_{2})}$$

    $$\mathbf{S_{B}w}=\frac{(\mathbf{w^{T}S_{W}w})}{(\mathbf{w^{T}S_{B}w})}\alpha_{1} \mathbf{(m_{1}-m_{2})}$$

    $$\mathbf{S_{B}w}=\alpha_{2}\alpha_{1} \mathbf{(m_{1}-m_{2})}$$

    $$\mathbf{w}=\alpha\mathbf{S_{B}^{-1}} \mathbf{(m_{1}-m_{2})}$$

     

    3. Procedure

    • Compute the d-dimensional mean vectors for the different classes from the dataset
    • Compute the scatter matrices (in-between-class and within-class scatter matrix)
    • Compute the eigenvectors and corresponding eigenvalues for the scatter matrices
    • Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a $d\times k$ dimensional matrix W where every column represents an eigenvector
    • Use this d by k eigenvector matrix to transform the samples onto the new subspace. this can be summarized by the matrix multiplication: $Y=X\times W$ where X is a $n\times d$ dimensional matrix representing the n samples, and y are the transformed $x\times k$ dimensional samples in the new subspace.

    4. Difference between PCA and LDA

    The goal of LDA is to project a feature space onto a small subspace while maintaining the class discriminatory information. And we have both PCA and LDA as linear transformation techniques used for dimensionality reduction PCA is an unsupervised algorithm but LVA is supervised because of the relation to the dependent variable.

    5. References

    https://pdfs.semanticscholar.org/d690/3041ea762bad6ff62cc2ec6fba2eb802634a.pdf

    https://en.wikipedia.org/wiki/Linear_discriminant_analysis

    https://sebastianraschka.com/Articles/2014_python_lda.html

    https://sthalles.github.io/fisher-linear-discriminant/

    http://www.sci.utah.edu/~shireen/pdfs/tutorials/Elhabian_LDA09.pdf

    https://www.cs.cmu.edu/~tom/10701_sp11/recitations/Recitation_11.pdf

    'MLAI > DimensionalityReduction' 카테고리의 다른 글

    Canonical Correlation Analysis  (0) 2020.01.25
    Difference PCA and Factor analysis  (0) 2020.01.23
    Feature selection  (0) 2019.10.06
    Principal component analysis(PCA)  (0) 2019.10.05

    댓글

Designed by Tistory.