MLAI/RecommendSystem 2022. 7. 7. 17:41

	Batman	X-Men	Star Wars	The Notebook	Bridget Jones' Diary
Alice	5	4.5	5	2	1
Bob	4.5	4		2	2
Carol	2	3	1	5	5

Intuitively, we see that Bob’s ratings are similar to Alice’s, thus he is likely to also like Star Wars. In math-speak, Alice’s and Bob’s ratings are highly correlated.

Average Rating

Limitations

It treats everyone’s rating of the movie. equally.
Bob’s s(i, j) equally depends on Alice’s rating and Carol’s rating, even though he doesn’t agree with Carol

$$s(i,j)=\frac{\sum_{i'\in\Omega_{j}}r_{i'j}}{|\Omega_{j}|}$$

One of the Solutions

Intuitively, I want to be small for users I don't agree with, large for users I do agree with

$$s(i,j)=\frac{\sum_{i\acute{}\in \Omega _{j}} w_{ii\acute{}}r_{i\acute{}j}}{\sum_{i\acute{}\in \Omega _{j}}w_{ii\acute{}}}$$

Another issue with average rating: Bias

Your interpretation of a rating is different from mine
Users can be biased to be optimistic or pessimistic
E.g. optimistic: most movies are a 5, a bad movie is a 3
E.g. pessimistic: most movies are 1 or 2, a good movie is a 4

Deviation (without weighting)

We don’t care about your absolute rating, we care how much it deviates from your own average

If your average is 2.5, but you rate something a 5, it must be really great
If you rate everything a 5, it’s difficult to know how those items compare

$$dev(i,j)=r(i,j)-\bar{r}, for \ a \ known \ rating$$

The deviation is your rating - average rating, in other words, rating - bias.

My predicted deviation is the average deviation for a movie

Then my predicted rating is my own average + predicted deviation

$$\hat{dev}(i,j)=\frac{\sum_{i'\in\Omega_{j}}{r(i',j)-\bar{r}_{i'}}}{|\Omega_{j}|}, for \ a \ prediction \ from \ known \ ratings$$

$$s(i,j)=\bar{r}_{i}+\frac{\sum_{i'\in\Omega_{j}}{r(i',j)-\bar{r}_{i'}}}{|\Omega_{j}|}=\bar{r}_{i}+\hat{dev}(i,j)$$

Combine (Weighting Rating + Deviation)

Combine the idea of deviations with the idea of weighting to get our final formula
Note: absolute value since weights can be negative
Looking at this equation, we can see that it's just linear regression.

Intuitively, I want it to be small for users I don’t agree with, large for users I do agree with

$$s(i,j)=\bar{r}_{i}+\frac{\sum_{i'\in\Omega_{j}}{w_{ii'}(r(i',j)-\bar{r}_{i'})}}{\sum_{i'\in\Omega_{j}}|w_{ii'}|}$$

Hot to calculate weights? Pearson Correlation Coefficient

$$\rho_{xy}=\frac{\sum^N_{i=1}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\sum^{N}_{i=1}(x_{i}-\bar{x})^2}\sqrt{\sum^{N}_{i=1}(y_{i}-\bar{y})^2}}$$

Problem: Our matrix is sparse

Solution: Just use the data we have

$$w_{ii'}=\frac{\sum_{j\in\psi_{ii'}}(r_{ij}-\bar{r}_{i})(r_{i'j}-\bar{r}_{i'})}{\sqrt{\sum_{j\in\psi_{ii'}}(r_{ij}-\bar{r}_{i})^2}\sqrt{\sum_{j\in\psi_{ii'}}(r_{i'j}-\bar{r}_{i'})^2}}$$

$\psi_{i}=set \ of \ movies \ that \ user \ i \ has \ rated$

$\psi_{ii'}=set \ of \ movies \ both \ user \ i \ and \ i' \ have \ rated$

$\psi_{ii'}=\psi_{i}\cap\psi_{i'}$

What if 2 users have zero movies in common, or just a few?
If zero, don’t consider i’ in user i’s calculation - it can’t be calculated
If few (e.g. < 5) then don’t use the weight, since not enough data to be accurate

Cosine Similarity

Pearson correlation is “old-school statistics”
These days, when you want to compare two vectors, it seems more fashionable to use the cosine similarity

$$cos\theta=\frac{x^Ty}{|x||y|}=\frac{\sum^N_{i=1}x_iy_i}{\sqrt{\sum^N_{i=1}x^2_i}\sqrt{\sum^N_{i=1}y^2_i}}$$

Cosine Similarity vs. Pearson Correlation

They are the same except Pearson is centered
We want to center them anyway because we’re working with deviations not absolute ratings
If working with deviations rather than average, the cosine similarity is equivalent to the Pearson Correlation

Problem

What if 2 users have 0 movies in common, or just a few?
If 0, don't consider i' in user i's calculation - it can't be calculated
If few (e.g. < 5) then don't use the weight, since not enough data to be accurate

Neighborhood

In practice, don’t sum over all users who rated movie j (takes too long)
It can help to precompute weights beforehand
Non-trivial: instead of summing over all users, take the ones with the highest weight (called neighbors)
- E.g. use K nearest neighbors, K = 25 up to 50

Is it useful to keep a negative weight? Yes

A strong correlation is very negative, but no correlation is 0
Thus, you might want to keep neighbors sorted on absolute correlation

Big Data

Suppose we have 100k = 100,000 users

$$Then\ we\ need\ 10^{10} \ weights$$

10 billion
32-bit float = 4 bytes
~ 40 GB
- Most modern computers have 4~32 GB RAM, 128 GB on the high end
- We are reaching the limit of capacity
Not just size, but the time required to calculate

$$O(N^2)\ is \ slow$$

Possible Issues

N users, M items then R is NxM
How long does it take to calculate the similarity between 2 users, w(i, i')? O(M)
For a single rating prediction, we have to find the weights against all other users
Even if we just keep the top K neighbors, we still need to calculate them all, since they must be sorted to find the top K
O(N) users, O(M) calculations each time -> O(NM)
If you're a company building a recommendation engine, you need to make recommendations for all N users
We need w(i,i') for i...N and i'=1...N -> $O(N^{2})$
Including the time to calculate each correlation -> $O(N^{2}M)$

Computation Complexity Scenario

Work on a subset of data

Take top n users and top m movies (n < N, m < M)
Top user = user who has rated the most movies
Top movie = movie that has been rated the most times
Yield a more dense matrix
Experiment to determine a workable value for n and m

Recommendation for a single user

Calculate weights between this use and all others: O(MN)
Calculate scores s(i, j) for all items: also O(MN)
- M items
- Sum over N terms for each item
Sort the scores: O(MlogM)
Total: O(MN) + O(MlogM)
This is theoretical - practically, you can make things faster
- Precomputing user-user weights, but not real-time
- For the score, only sum using users similar to me - if top K, then O(MK)
- Sorting - if keep only L items, then O(MlogL)

Precomputing

Precompute recommendations using (possibly) big data technologies as offline jobs
You don’t want to do an O(MlogM) sort in real-time - even O(M) would be bad
Use a “cronjob” to run your task every day at some time, e.g. 3 am
Your API can simply retrieve items already stored in a database

Desired Output

Since we are doing regression, we want MSE
Outline
- Split data into train and test
- Calculate weights using a train set
- Make a predict function, e.g. score ← predict(i, j)
- Output MSE for train and test sets

저작자표시 비영리 변경금지 (새창열림)

'MLAI > RecommendSystem' 카테고리의 다른 글

Matrix Factorization (1)	2022.07.12
Item-Item Collaborative Filtering (0)	2022.07.12
AWS Personalize (0)	2022.07.07
Association Analysis (0)	2022.07.07

ABOUT ME

Demyank's Tlog Demyank's Tlog

Average Rating

Limitations

One of the Solutions

Another issue with average rating: Bias

Deviation (without weighting)

Combine (Weighting Rating + Deviation)

Hot to calculate weights? Pearson Correlation Coefficient

Cosine Similarity

Cosine Similarity vs. Pearson Correlation

Problem

Neighborhood

Is it useful to keep a negative weight? Yes

Big Data

Possible Issues

Computation Complexity Scenario

Work on a subset of data

Recommendation for a single user

Precomputing

Desired Output

'MLAI > RecommendSystem' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Average Rating

Limitations

One of the Solutions

Another issue with average rating: Bias

Deviation (without weighting)

Combine (Weighting Rating + Deviation)

Hot to calculate weights? Pearson Correlation Coefficient

Cosine Similarity

Cosine Similarity vs. Pearson Correlation

Problem

Neighborhood

Is it useful to keep a negative weight? Yes

Big Data

Possible Issues

Computation Complexity Scenario

Work on a subset of data

Recommendation for a single user

Precomputing

Desired Output

'MLAI > RecommendSystem' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바