-
User-User Collaborative FilteringMLAI/RecommendSystem 2022. 7. 7. 17:41
Batman X-Men Star Wars The Notebook Bridget Jones' Diary Alice 5 4.5 5 2 1 Bob 4.5 4 2 2 Carol 2 3 1 5 5 Intuitively, we see that Bob’s ratings are similar to Alice’s, thus he is likely to also like Star Wars. In math-speak, Alice’s and Bob’s ratings are highly correlated.
Average Rating
Limitations
- It treats everyone’s rating of the movie. equally.
- Bob’s s(i, j) equally depends on Alice’s rating and Carol’s rating, even though he doesn’t agree with Carol
$$s(i,j)=\frac{\sum_{i'\in\Omega_{j}}r_{i'j}}{|\Omega_{j}|}$$
One of the Solutions
Intuitively, I want to be small for users I don't agree with, large for users I do agree with
$$s(i,j)=\frac{\sum_{i\acute{}\in \Omega _{j}} w_{ii\acute{}}r_{i\acute{}j}}{\sum_{i\acute{}\in \Omega _{j}}w_{ii\acute{}}}$$
Another issue with average rating: Bias
- Your interpretation of a rating is different from mine
- Users can be biased to be optimistic or pessimistic
- E.g. optimistic: most movies are a 5, a bad movie is a 3
- E.g. pessimistic: most movies are 1 or 2, a good movie is a 4
Deviation (without weighting)
We don’t care about your absolute rating, we care how much it deviates from your own average
- If your average is 2.5, but you rate something a 5, it must be really great
- If you rate everything a 5, it’s difficult to know how those items compare
$$dev(i,j)=r(i,j)-\bar{r}, for \ a \ known \ rating$$
The deviation is your rating - average rating, in other words, rating - bias.
My predicted deviation is the average deviation for a movie
Then my predicted rating is my own average + predicted deviation
$$\hat{dev}(i,j)=\frac{\sum_{i'\in\Omega_{j}}{r(i',j)-\bar{r}_{i'}}}{|\Omega_{j}|}, for \ a \ prediction \ from \ known \ ratings$$
$$s(i,j)=\bar{r}_{i}+\frac{\sum_{i'\in\Omega_{j}}{r(i',j)-\bar{r}_{i'}}}{|\Omega_{j}|}=\bar{r}_{i}+\hat{dev}(i,j)$$
Combine (Weighting Rating + Deviation)
- Combine the idea of deviations with the idea of weighting to get our final formula
- Note: absolute value since weights can be negative
- Looking at this equation, we can see that it's just linear regression.
Intuitively, I want it to be small for users I don’t agree with, large for users I do agree with
$$s(i,j)=\bar{r}_{i}+\frac{\sum_{i'\in\Omega_{j}}{w_{ii'}(r(i',j)-\bar{r}_{i'})}}{\sum_{i'\in\Omega_{j}}|w_{ii'}|}$$
Hot to calculate weights? Pearson Correlation Coefficient
$$\rho_{xy}=\frac{\sum^N_{i=1}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\sum^{N}_{i=1}(x_{i}-\bar{x})^2}\sqrt{\sum^{N}_{i=1}(y_{i}-\bar{y})^2}}$$
Problem: Our matrix is sparse
Solution: Just use the data we have
$$w_{ii'}=\frac{\sum_{j\in\psi_{ii'}}(r_{ij}-\bar{r}_{i})(r_{i'j}-\bar{r}_{i'})}{\sqrt{\sum_{j\in\psi_{ii'}}(r_{ij}-\bar{r}_{i})^2}\sqrt{\sum_{j\in\psi_{ii'}}(r_{i'j}-\bar{r}_{i'})^2}}$$
$\psi_{i}=set \ of \ movies \ that \ user \ i \ has \ rated$
$\psi_{ii'}=set \ of \ movies \ both \ user \ i \ and \ i' \ have \ rated$
$\psi_{ii'}=\psi_{i}\cap\psi_{i'}$
- What if 2 users have zero movies in common, or just a few?
- If zero, don’t consider i’ in user i’s calculation - it can’t be calculated
- If few (e.g. < 5) then don’t use the weight, since not enough data to be accurate
Cosine Similarity
- Pearson correlation is “old-school statistics”
- These days, when you want to compare two vectors, it seems more fashionable to use the cosine similarity
$$cos\theta=\frac{x^Ty}{|x||y|}=\frac{\sum^N_{i=1}x_iy_i}{\sqrt{\sum^N_{i=1}x^2_i}\sqrt{\sum^N_{i=1}y^2_i}}$$
Cosine Similarity vs. Pearson Correlation
- They are the same except Pearson is centered
- We want to center them anyway because we’re working with deviations not absolute ratings
- If working with deviations rather than average, the cosine similarity is equivalent to the Pearson Correlation
Problem
- What if 2 users have 0 movies in common, or just a few?
- If 0, don't consider i' in user i's calculation - it can't be calculated
- If few (e.g. < 5) then don't use the weight, since not enough data to be accurate
Neighborhood
- In practice, don’t sum over all users who rated movie j (takes too long)
- It can help to precompute weights beforehand
- Non-trivial: instead of summing over all users, take the ones with the highest weight (called neighbors)
- E.g. use K nearest neighbors, K = 25 up to 50
Is it useful to keep a negative weight? Yes
- A strong correlation is very negative, but no correlation is 0
- Thus, you might want to keep neighbors sorted on absolute correlation
Big Data
- Suppose we have 100k = 100,000 users
$$Then\ we\ need\ 10^{10} \ weights$$
- 10 billion
- 32-bit float = 4 bytes
- ~ 40 GB
- Most modern computers have 4~32 GB RAM, 128 GB on the high end
- We are reaching the limit of capacity
- Not just size, but the time required to calculate
$$O(N^2)\ is \ slow$$
Possible Issues
- N users, M items then R is NxM
- How long does it take to calculate the similarity between 2 users, w(i, i')? O(M)
- For a single rating prediction, we have to find the weights against all other users
- Even if we just keep the top K neighbors, we still need to calculate them all, since they must be sorted to find the top K
- O(N) users, O(M) calculations each time -> O(NM)
- If you're a company building a recommendation engine, you need to make recommendations for all N users
- We need w(i,i') for i...N and i'=1...N -> $O(N^{2})$
- Including the time to calculate each correlation -> $O(N^{2}M)$
Computation Complexity Scenario
Work on a subset of data
- Take top n users and top m movies (n < N, m < M)
- Top user = user who has rated the most movies
- Top movie = movie that has been rated the most times
- Yield a more dense matrix
- Experiment to determine a workable value for n and m
Recommendation for a single user
- Calculate weights between this use and all others: O(MN)
- Calculate scores s(i, j) for all items: also O(MN)
- M items
- Sum over N terms for each item
- Sort the scores: O(MlogM)
- Total: O(MN) + O(MlogM)
- This is theoretical - practically, you can make things faster
- Precomputing user-user weights, but not real-time
- For the score, only sum using users similar to me - if top K, then O(MK)
- Sorting - if keep only L items, then O(MlogL)
Precomputing
- Precompute recommendations using (possibly) big data technologies as offline jobs
- You don’t want to do an O(MlogM) sort in real-time - even O(M) would be bad
- Use a “cronjob” to run your task every day at some time, e.g. 3 am
- Your API can simply retrieve items already stored in a database
Desired Output
- Since we are doing regression, we want MSE
- Outline
- Split data into train and test
- Calculate weights using a train set
- Make a predict function, e.g. score ← predict(i, j)
- Output MSE for train and test sets
'MLAI > RecommendSystem' 카테고리의 다른 글
Matrix Factorization (1) 2022.07.12 Item-Item Collaborative Filtering (0) 2022.07.12 AWS Personalize (0) 2022.07.07 Association Analysis (0) 2022.07.07