Processing math: 100%

ABOUT ME

For organizing technical knowledge and experience. demyank88@gmail.com

Today
Yesterday
Total
  • User-User Collaborative Filtering
    MLAI/RecommendSystem 2022. 7. 7. 17:41
      Batman X-Men Star Wars The Notebook Bridget Jones' Diary
    Alice 5 4.5 5 2 1
    Bob 4.5 4   2 2
    Carol 2 3 1 5 5

    Intuitively, we see that Bob’s ratings are similar to Alice’s, thus he is likely to also like Star Wars. In math-speak, Alice’s and Bob’s ratings are highly correlated.

    Average Rating

    Limitations

    • It treats everyone’s rating of the movie. equally.
    • Bob’s s(i, j) equally depends on Alice’s rating and Carol’s rating, even though he doesn’t agree with Carol

    s(i,j)=iΩjrij|Ωj|

    One of the Solutions

    Intuitively, I want to be small for users I don't agree with, large for users I do agree with

    s(i,j)=i´Ωjwii´ri´ji´Ωjwii´

    Another issue with average rating: Bias

    • Your interpretation of a rating is different from mine
    • Users can be biased to be optimistic or pessimistic
    • E.g. optimistic: most movies are a 5, a bad movie is a 3
    • E.g. pessimistic: most movies are 1 or 2, a good movie is a 4

    Deviation (without weighting)

    We don’t care about your absolute rating, we care how much it deviates from your own average

    • If your average is 2.5, but you rate something a 5, it must be really great
    • If you rate everything a 5, it’s difficult to know how those items compare

    dev(i,j)=r(i,j)ˉr,for a known rating

    The deviation is your rating - average rating, in other words, rating - bias.

    My predicted deviation is the average deviation for a movie

    Then my predicted rating is my own average + predicted deviation

    ^dev(i,j)=iΩjr(i,j)ˉri|Ωj|,for a prediction from known ratings

    s(i,j)=ˉri+iΩjr(i,j)ˉri|Ωj|=ˉri+^dev(i,j)

    Combine (Weighting Rating + Deviation)

    • Combine the idea of deviations with the idea of weighting to get our final formula
    • Note: absolute value since weights can be negative
    • Looking at this equation, we can see that it's just linear regression.

    Intuitively, I want it to be small for users I don’t agree with, large for users I do agree with

    s(i,j)=ˉri+iΩjwii(r(i,j)ˉri)iΩj|wii|

    Hot to calculate weights? Pearson Correlation Coefficient

    ρxy=Ni=1(xiˉx)(yiˉy)Ni=1(xiˉx)2Ni=1(yiˉy)2

    Problem: Our matrix is sparse

    Solution: Just use the data we have

    wii=jψii(rijˉri)(rijˉri)jψii(rijˉri)2jψii(rijˉri)2

    ψi=set of movies that user i has rated

    ψii=set of movies both user i and i have rated

    ψii=ψiψi

    • What if 2 users have zero movies in common, or just a few?
    • If zero, don’t consider i’ in user i’s calculation - it can’t be calculated
    • If few (e.g. < 5) then don’t use the weight, since not enough data to be accurate

    Cosine Similarity

    • Pearson correlation is “old-school statistics”
    • These days, when you want to compare two vectors, it seems more fashionable to use the cosine similarity

    cosθ=xTy|x||y|=Ni=1xiyiNi=1x2iNi=1y2i

    Cosine Similarity vs. Pearson Correlation

    • They are the same except Pearson is centered
    • We want to center them anyway because we’re working with deviations not absolute ratings
    • If working with deviations rather than average, the cosine similarity is equivalent to the Pearson Correlation

    Problem

    • What if 2 users have 0 movies in common, or just a few?
    • If 0, don't consider i' in user i's calculation - it can't be calculated
    • If few (e.g. < 5) then don't use the weight, since not enough data to be accurate

    Neighborhood

    • In practice, don’t sum over all users who rated movie j (takes too long)
    • It can help to precompute weights beforehand
    • Non-trivial: instead of summing over all users, take the ones with the highest weight (called neighbors)
      • E.g. use K nearest neighbors, K = 25 up to 50

    Is it useful to keep a negative weight? Yes

    • A strong correlation is very negative, but no correlation is 0
    • Thus, you might want to keep neighbors sorted on absolute correlation

    Big Data

    • Suppose we have 100k = 100,000 users

    Then we need 1010 weights

    • 10 billion
    • 32-bit float = 4 bytes
    • ~ 40 GB
      • Most modern computers have 4~32 GB RAM, 128 GB on the high end
      • We are reaching the limit of capacity
    • Not just size, but the time required to calculate

    O(N2) is slow

    Possible Issues

    • N users, M items then R is NxM
    • How long does it take to calculate the similarity between 2 users, w(i, i')? O(M)
    • For a single rating prediction, we have to find the weights against all other users
    • Even if we just keep the top K neighbors, we still need to calculate them all, since they must be sorted to find the top K
    • O(N) users, O(M) calculations each time -> O(NM)
    • If you're a company building a recommendation engine, you need to make recommendations for all N users
    • We need w(i,i') for i...N and i'=1...N -> O(N2)
    • Including the time to calculate each correlation -> O(N2M)

    Computation Complexity Scenario

    Work on a subset of data

    • Take top n users and top m movies (n < N, m < M)
    • Top user = user who has rated the most movies
    • Top movie = movie that has been rated the most times
    • Yield a more dense matrix
    • Experiment to determine a workable value for n and m

    Recommendation for a single user

    • Calculate weights between this use and all others: O(MN)
    • Calculate scores s(i, j) for all items: also O(MN)
      • M items
      • Sum over N terms for each item
    • Sort the scores: O(MlogM)
    • Total: O(MN) + O(MlogM)
    • This is theoretical - practically, you can make things faster
      • Precomputing user-user weights, but not real-time
      • For the score, only sum using users similar to me - if top K, then O(MK)
      • Sorting - if keep only L items, then O(MlogL)

    Precomputing

    • Precompute recommendations using (possibly) big data technologies as offline jobs
    • You don’t want to do an O(MlogM) sort in real-time - even O(M) would be bad
    • Use a “cronjob” to run your task every day at some time, e.g. 3 am
    • Your API can simply retrieve items already stored in a database

    Desired Output

    • Since we are doing regression, we want MSE
    • Outline
      • Split data into train and test
      • Calculate weights using a train set
      • Make a predict function, e.g. score ← predict(i, j)
      • Output MSE for train and test sets

    'MLAI > RecommendSystem' 카테고리의 다른 글

    Matrix Factorization  (1) 2022.07.12
    Item-Item Collaborative Filtering  (0) 2022.07.12
    AWS Personalize  (0) 2022.07.07
    Association Analysis  (0) 2022.07.07

    댓글

Designed by Tistory.