-
Feature selectionMLAI/DimensionalityReduction 2019. 10. 6. 14:59
1. Overview
In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons:
- simplification of models to make them easier to interpret by researchers/users,
- shorter training times,
- to avoid the curse of dimensionality,
- enhanced generalization by reducing overfitting (formally, reduction of variance)
2. Description
2.1 Discriminatory power or Class separation
2.1.1 Within-class and Between-class variance
2.1.2 Linearly separable, Multiple modes, and variance
2.2 Metric for discriminatory power
2.2.1 Kullback-Leibler divergence
$$d_{ij}=KL(p(\boldsymbol{x}\, |\, \omega _{i}),p(\boldsymbol{x}\, |\, \omega _{j}))+KL(p(\boldsymbol{x}\, |\, \omega _{j}),p(\boldsymbol{x}\, |\, \omega _{i}))$$
$$d=\sum_{i}^{M}\sum_{j}^{M}P(\omega _{i})P(\omega _{j})d_{ij}$$
2.2.2 Euclidean and Mahalanobis distance of samples $\boldsymbol{x_{i}^{k}}$
$$d_{ij}=\frac{1}{N_{i}N_{j}}\sum_{k=1}^{N_{i}}\sum_{m=1}^{N_{j}}dist(x_{i}^{k},x_{j}^{m})$$
$$d=\sum_{i=1}^{M}\sum_{j=1}^{M}P(\omega _{i})P(\omega _{j})d_{ij}$$
2.2.3 Classification Performance metrics
2.3 Feature selection and Feature subset
2.3.1 Combinatorial optimization and dimensionality
$J(S)$'s discriminatory power
$$\boldsymbol{x}=\left \{ x_{1},x_{2},\cdots, x_{d} \right \}^{T}$$
$$\boldsymbol{F}=\left \{ x_{1},x_{2},\cdots, x_{d} \right \}$$
$$\hat{d_{1}}\leq \left | S \right |\leq \hat{d_{2}}$$
where $d> \hat{d}$ and $\hat{d}$ is size of optimal feature subset $\hat{F}$ from original feature set $F$ with size $d$
- Search space
Time complexity of random search
$$\Theta (2^{d})$$
2.4 Search algorithm for search space
2.4.1 Random search
2.4.2 Global search
- Exhaustive search
- Branch-and-bound
Monotonicity
$$S_{1}\subset S_{2}, then\, J(S_{1})\leq J(S_{2})$$
Recursive BranchBound(S)
next_children(S)depth-first search
2.4.3 Sequential search
- Sequential forward search(SFS)
Hamming distance
- Plus-p-take-away-q(PTA)
- Sequential forward floating search
- Sequential backward floating search
Limitation
Greedgreedy algorithm
2.5 Stochastic searching
2.5.1 simulated annealing
2.5.2 genetic algorithm
- population
- crossover
3. References
http://www.causality.inf.ethz.ch/ciml/FeatureExtractionManuscript.pdf
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
https://en.wikipedia.org/wiki/Euclidean_distance
https://www.cs.upc.edu/~belanche/research/R02-62.pdf
https://www.slideshare.net/parindarajapaksha/analysis-of-feature-selection-algorithms
https://www.ic.unicamp.br/~wainer/cursos/1s2012/mc906/FeatureSelection.pdf
https://ieeexplore.ieee.org/abstract/document/1335448
https://peachturnspro.wordpress.com/2016/04/19/stuck-in-a-local-optima/
'MLAI > DimensionalityReduction' 카테고리의 다른 글
Canonical Correlation Analysis (0) 2020.01.25 Difference PCA and Factor analysis (0) 2020.01.23 Fisher's linear discriminant(Linear discriminant analysis, LDA) (0) 2019.10.05 Principal component analysis(PCA) (0) 2019.10.05