ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Cluster Analysis
    MLAI/Regression 2020. 1. 20. 14:12

    1. Overview

    Technically speaking cluster analysis is a multivariate statistical technique. Intuitively speaking observations in a data set can be divided into different groups and sometimes this is very useful.

    Both results are perfectly logical but in a different way in the first two cases we were differentiating the clusters by geographic proximity while in the second by language geographic proximity and language are two different features by which we can cluster the observations.

    The goal of clustering is to maximize the similarity of observations within a cluster and maximize the dissimilarity between clusters. That, of course, is done with respect to some feature or features.

    2. Application

    2.1 Market segmentation

    The firm gives you all the data they've gathered and given you the green light to create their next marketing campaign. You have no idea who buys the product. So you decide to create a scatterplot of all customers depending on the amount of money they spend their age.

    Moreover, most of the data points are middle-aged people who spend a lot. That's most probably who we should aim the marketing at

    2.2 Image segmentation

    It is a very useful technique for exploring and identifying patterns in the data. Data Scientists often turn to it when they have no idea where to start and what to expect cluster analysis

    This photo is very cool for image segmentation precisely because there are some elements in different colors that can be segmented. Each color in the photo is a different cluster.

    In the first photo, We have three clusters the white one the kind of beige one and the Dark One. That's why it is so vague.

    We have 10 clusters, so 10 colors. There was already enough detail to see that it is actually a dog laying on the ground. Moreover, the color of the bandana formed a big enough cluster to actually preserve its blue color as a separate cluster.

    In the third photo 30, Although it seems like a small improvement there are three times more colors than in the second one and we can already see details like an ear and different colors of the fir.

    in the R.G. color model, there are 16,777,216 possible colors. So to reproduce a whole image we would need that many clusters and by the way that does not mean that the colors will be perfect.

    Now we just turned a $17 color photo into one with three 10 or 30 colors. Such simplicity implies a smaller size and in fact, we have compressed the photo. That's one of the uses of clustering for a short period of time.

    3. Classification and Clustering

    3.1 Classification

    Predicting an output category, given input data

    3.2 Clustering

    Grouping data points together based on similarities among them and difference from others

    4. Reference

     

    'MLAI > Regression' 카테고리의 다른 글

    Polynomial Linear Regression  (0) 2020.01.20
    Logistic Regression Statistics  (0) 2020.01.20
    Ordinary Least Squares Assumptions  (0) 2020.01.20
    Correlation vs Regression  (0) 2020.01.19
    Multiple Linear regression  (0) 2020.01.19

    댓글

Designed by Tistory.