Loading [MathJax]/jax/output/CommonHTML/jax.js

ABOUT ME

For organizing technical knowledge and experience. demyank88@gmail.com

Today
Yesterday
Total
  • Item-Item Collaborative Filtering
    MLAI/RecommendSystem 2022. 7. 12. 14:49

    Intuition

    • The correlation between the column vectors is high
    • If you like Power Rangers, you'll also like Transformers because users give them similar ratings
      Power Rangers Transformers Ninja Turtles
    User 1 4.5 5 4
    User 2 5 5 4.5
    User 3 1 2 0.5
    User 4 2 2 0.5

    User-User Collaborative Filtering

    • For user-user CF, I want to find "users like me"
    • The movies that those users have seen, that I haven't seen, become my recommendations
    • It's intuitive that if they are "like me", I would like movies they've rated highly
    • Looks row-wise
    • Each row is a vector
    • 2 users are similar in their row vectors have a small distance between them

    Item-Item Collaborative Filtering

    • What if we looked column-wise instead?
    • Let's find 2 products that are similar
    • They are similar if their column vectors' distance is small

    Item Correlation

    wii=jΩii(rijˉri)(rijˉri)jΩii(rijˉri)2jΩii(rijˉri)2

    Ωj=users who rated item j

    Ωjj=users who rated item j and item j

    ˉrj=average rating for item j

    Item Score

    s(i,j)=ˉri+iΨjwii(r(i,j)ˉri)iΨj|wii|

    Ψ=items user i has rated

    • Deviation: how much user i likes item j', compared to how much everyone else likes j' (IMO, not as intuitive as user-user CF)
    • If user i really likes j' (more than other users do) and j is similar to j' (wjj is high), then user i probably likes j too

    Comparison

    • User-User CF: choose items for a user, because those items have been liked by similar users
    • Item-Item CF: choose items for a user, because this user has liked similar items in the past
    • By flipping the ratings matrix sideways, we can convert user-user CF algorithm into an item-item CF algorithm
    • User-based and Item-based CF are mathematically identical
    • Item-based CF is more accurate because more data to work with

    Practical differences

    • When comparing 2 items, you have a lot more data than when comparing 2 users
      • Each user: up to ~ 20k items to look at
      • Each item: up to 100k users to look at
      • Thus for item-based CF, weights are calculated based on more data
    • Item-based CF is faster
      • Given a user, calculate scores for each item: $O(M^{2}N)
        • There are M2 item-item weights, and each vector is length N
      • For user-based CF we saw O(N2M)
      • N >> M, so N2 compared to M2 is even worse
    • Item-based CF is more accurate

    Limitation

    • Item-based CF may be too accurate
    • It's always suggesting similar products
    • This leads to a lack of diversity in recommendations - the YouTube problem
    • Worse MSE might be more desirable

    The Cold-Start Problem

    • We know that if we don't have enough data, we can't calculate correlations
    • What if we don't have any data at all?
      • Add a prior to the average
      • The score can be a weighted sum of prediction + prior average
      • No data at all -> rely solely on prior
      • How to get prior? Scrape from the web or something else

    Not necessarily movies/ratings

    • user-item matrix doesn't have to be ratings at all
    • Explicit feedback is sparse
    • # of times user viewed a product
    • Did they purchase?
    • Hit like?
    • Share on social media?

    Reference

    https://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf

    https://takuti.github.io/Recommendation.jl/latest/collaborative_filtering/

    https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-sims.html

    'MLAI > RecommendSystem' 카테고리의 다른 글

    Matrix Factorization  (1) 2022.07.12
    AWS Personalize  (0) 2022.07.07
    User-User Collaborative Filtering  (0) 2022.07.07
    Association Analysis  (0) 2022.07.07
Designed by Tistory.