-
Association AnalysisMLAI/RecommendSystem 2022. 7. 7. 17:31
Definition
As a role-based model, it is an algorithm that finds out how an item relates to another item. This association exists in two forms.
- How often are they purchased together? (Frequent)
- If someone bought item A, will he also buy item B?
It is also expressed as a shopping Market basket analysis because it is similar to looking at which products are contained in one shopping basket.
Ex) Famous anecdote is that when purchasing beer at Walmart, there is a high tendency to buy diapers together, so he also set up a strategy to display the two together.
Minsup
$$s(X)=\frac{\sigma(X)}{N}$$
An itemset X is called frequent if $s(X)$ is greater than some user-defined threshold, minsup.
Association Rule
Frequent Itemset Generation, whose objective is to find all the itemsets that satisfy the minsup threshold.
An association rule is an implication expression of the form X→Y, where X and Y are disjoint itemsets (X∩Y=∅).
The strength of an association rule can be measured in terms of its support and confidence. A rule that has very low support may occur simply by chance. Confidence measures the reliability of the inference made by a rule.
Support
For the rule A → B,
$$support(A)=P(A, B)$$
OR
$$\sigma(X) \ is \ the \ support \ count \ of \ X \\ N \ is \ the \ count \ of \ the \ transactions \ set \ T \\ s(X \rightarrow \ Y)=\frac{\sigma(X\cup Y)}{N}$$
Confidence
$$confidence(A \rightarrow B)=\frac{P(A,B)}{P(A)}$$
Lift
Measure how frequently events occur at the same time or independent
$$lift(A\rightarrow B)=\frac{P(A,B)}{P(A)\times P(B) }$$
$$lift(A, B)\left\{\begin{matrix}
= 1, if \ A \ and \ B \ are \ independent \\
> 1, if \ A \ and \ B \ are \ positively \ related \\
< 1, if \ A \ and \ B \ are \ negatively \ related \end{matrix}\right.$$Rule Generation
Rule generation, whose objective is to extract all the high confidence rules from the frequent itemsets found in the Frequent Itemset Generation. These rules are called strong rules.
Extract all rules from the itemsets
Problem
The number of rules increases exponentially as items increase
Example
TID Items 1 {Bread, Milk} 2 {Bread, Diapers, Beer, Eggs} 3 {Milk, Diapers, Beer, Cola} 4 {Bread, Milk, Diapers, Beer} 5 {Bread, Milk, Diapers, Cola} {Beer, Diaspers, Milk} Support = $\frac{\sigma(X \cup Y)}{N}$ = $\frac{2}{5}$
{Milk, Diapers} -> {Beer} Confidence = $\frac{\sigma(X \cup Y)}{\sigma (X)}$ = $\frac{2}{3}$
Reference
https://chih-ling-hsu.github.io/2017/03/25/Data-Mining-Association-Analysis
https://livebook.manning.com/book/machine-learning-in-action/chapter-11/33
https://www.youtube.com/watch?v=43gb7WK56Sk
https://www-users.cse.umn.edu/~kumar001/dmbook/ch5_association_analysis.pdf
'MLAI > RecommendSystem' 카테고리의 다른 글
Matrix Factorization (1) 2022.07.12 Item-Item Collaborative Filtering (0) 2022.07.12 AWS Personalize (0) 2022.07.07 User-User Collaborative Filtering (0) 2022.07.07