ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • User-User Collaborative Filtering
    MLAI/RecommendSystem 2022. 7. 7. 17:41
      Batman X-Men Star Wars The Notebook Bridget Jones' Diary
    Alice 5 4.5 5 2 1
    Bob 4.5 4   2 2
    Carol 2 3 1 5 5

    Intuitively, we see that Bob’s ratings are similar to Alice’s, thus he is likely to also like Star Wars. In math-speak, Alice’s and Bob’s ratings are highly correlated.

    Average Rating

    Limitations

    • It treats everyone’s rating of the movie. equally.
    • Bob’s s(i, j) equally depends on Alice’s rating and Carol’s rating, even though he doesn’t agree with Carol

    $$s(i,j)=\frac{\sum_{i'\in\Omega_{j}}r_{i'j}}{|\Omega_{j}|}$$

    One of the Solutions

    Intuitively, I want to be small for users I don't agree with, large for users I do agree with

    $$s(i,j)=\frac{\sum_{i\acute{}\in \Omega _{j}} w_{ii\acute{}}r_{i\acute{}j}}{\sum_{i\acute{}\in \Omega _{j}}w_{ii\acute{}}}$$

    Another issue with average rating: Bias

    • Your interpretation of a rating is different from mine
    • Users can be biased to be optimistic or pessimistic
    • E.g. optimistic: most movies are a 5, a bad movie is a 3
    • E.g. pessimistic: most movies are 1 or 2, a good movie is a 4

    Deviation (without weighting)

    We don’t care about your absolute rating, we care how much it deviates from your own average

    • If your average is 2.5, but you rate something a 5, it must be really great
    • If you rate everything a 5, it’s difficult to know how those items compare

    $$dev(i,j)=r(i,j)-\bar{r}, for \ a \ known \ rating$$

    The deviation is your rating - average rating, in other words, rating - bias.

    My predicted deviation is the average deviation for a movie

    Then my predicted rating is my own average + predicted deviation

    $$\hat{dev}(i,j)=\frac{\sum_{i'\in\Omega_{j}}{r(i',j)-\bar{r}_{i'}}}{|\Omega_{j}|}, for \ a \ prediction \ from \ known \ ratings$$

    $$s(i,j)=\bar{r}_{i}+\frac{\sum_{i'\in\Omega_{j}}{r(i',j)-\bar{r}_{i'}}}{|\Omega_{j}|}=\bar{r}_{i}+\hat{dev}(i,j)$$

    Combine (Weighting Rating + Deviation)

    • Combine the idea of deviations with the idea of weighting to get our final formula
    • Note: absolute value since weights can be negative
    • Looking at this equation, we can see that it's just linear regression.

    Intuitively, I want it to be small for users I don’t agree with, large for users I do agree with

    $$s(i,j)=\bar{r}_{i}+\frac{\sum_{i'\in\Omega_{j}}{w_{ii'}(r(i',j)-\bar{r}_{i'})}}{\sum_{i'\in\Omega_{j}}|w_{ii'}|}$$

    Hot to calculate weights? Pearson Correlation Coefficient

    $$\rho_{xy}=\frac{\sum^N_{i=1}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\sum^{N}_{i=1}(x_{i}-\bar{x})^2}\sqrt{\sum^{N}_{i=1}(y_{i}-\bar{y})^2}}$$

    Problem: Our matrix is sparse

    Solution: Just use the data we have

    $$w_{ii'}=\frac{\sum_{j\in\psi_{ii'}}(r_{ij}-\bar{r}_{i})(r_{i'j}-\bar{r}_{i'})}{\sqrt{\sum_{j\in\psi_{ii'}}(r_{ij}-\bar{r}_{i})^2}\sqrt{\sum_{j\in\psi_{ii'}}(r_{i'j}-\bar{r}_{i'})^2}}$$

    $\psi_{i}=set \ of \ movies \ that \ user \ i \ has \ rated$

    $\psi_{ii'}=set \ of \ movies \ both \ user \ i \ and \ i' \  have \ rated$

    $\psi_{ii'}=\psi_{i}\cap\psi_{i'}$

    • What if 2 users have zero movies in common, or just a few?
    • If zero, don’t consider i’ in user i’s calculation - it can’t be calculated
    • If few (e.g. < 5) then don’t use the weight, since not enough data to be accurate

    Cosine Similarity

    • Pearson correlation is “old-school statistics”
    • These days, when you want to compare two vectors, it seems more fashionable to use the cosine similarity

    $$cos\theta=\frac{x^Ty}{|x||y|}=\frac{\sum^N_{i=1}x_iy_i}{\sqrt{\sum^N_{i=1}x^2_i}\sqrt{\sum^N_{i=1}y^2_i}}$$

    Cosine Similarity vs. Pearson Correlation

    • They are the same except Pearson is centered
    • We want to center them anyway because we’re working with deviations not absolute ratings
    • If working with deviations rather than average, the cosine similarity is equivalent to the Pearson Correlation

    Problem

    • What if 2 users have 0 movies in common, or just a few?
    • If 0, don't consider i' in user i's calculation - it can't be calculated
    • If few (e.g. < 5) then don't use the weight, since not enough data to be accurate

    Neighborhood

    • In practice, don’t sum over all users who rated movie j (takes too long)
    • It can help to precompute weights beforehand
    • Non-trivial: instead of summing over all users, take the ones with the highest weight (called neighbors)
      • E.g. use K nearest neighbors, K = 25 up to 50

    Is it useful to keep a negative weight? Yes

    • A strong correlation is very negative, but no correlation is 0
    • Thus, you might want to keep neighbors sorted on absolute correlation

    Big Data

    • Suppose we have 100k = 100,000 users

    $$Then\ we\ need\ 10^{10} \ weights$$

    • 10 billion
    • 32-bit float = 4 bytes
    • ~ 40 GB
      • Most modern computers have 4~32 GB RAM, 128 GB on the high end
      • We are reaching the limit of capacity
    • Not just size, but the time required to calculate

    $$O(N^2)\ is \ slow$$

    Possible Issues

    • N users, M items then R is NxM
    • How long does it take to calculate the similarity between 2 users, w(i, i')? O(M)
    • For a single rating prediction, we have to find the weights against all other users
    • Even if we just keep the top K neighbors, we still need to calculate them all, since they must be sorted to find the top K
    • O(N) users, O(M) calculations each time -> O(NM)
    • If you're a company building a recommendation engine, you need to make recommendations for all N users
    • We need w(i,i') for i...N and i'=1...N -> $O(N^{2})$
    • Including the time to calculate each correlation -> $O(N^{2}M)$

    Computation Complexity Scenario

    Work on a subset of data

    • Take top n users and top m movies (n < N, m < M)
    • Top user = user who has rated the most movies
    • Top movie = movie that has been rated the most times
    • Yield a more dense matrix
    • Experiment to determine a workable value for n and m

    Recommendation for a single user

    • Calculate weights between this use and all others: O(MN)
    • Calculate scores s(i, j) for all items: also O(MN)
      • M items
      • Sum over N terms for each item
    • Sort the scores: O(MlogM)
    • Total: O(MN) + O(MlogM)
    • This is theoretical - practically, you can make things faster
      • Precomputing user-user weights, but not real-time
      • For the score, only sum using users similar to me - if top K, then O(MK)
      • Sorting - if keep only L items, then O(MlogL)

    Precomputing

    • Precompute recommendations using (possibly) big data technologies as offline jobs
    • You don’t want to do an O(MlogM) sort in real-time - even O(M) would be bad
    • Use a “cronjob” to run your task every day at some time, e.g. 3 am
    • Your API can simply retrieve items already stored in a database

    Desired Output

    • Since we are doing regression, we want MSE
    • Outline
      • Split data into train and test
      • Calculate weights using a train set
      • Make a predict function, e.g. score ← predict(i, j)
      • Output MSE for train and test sets

    'MLAI > RecommendSystem' 카테고리의 다른 글

    Matrix Factorization  (1) 2022.07.12
    Item-Item Collaborative Filtering  (0) 2022.07.12
    AWS Personalize  (0) 2022.07.07
    Association Analysis  (0) 2022.07.07

    댓글

Designed by Tistory.