Measures to Aid Understandability and Classification of Association Rules
Just attended a workshop on Data Mining headline topic by my PhD colleague Rajesh Natarajan. Specifically, it was about:
"Association rules are implication rules that bring out the affinity between items in a database of transactions based on their co-occurrence. First used in the market-basket context, association rules inform which article(s) are likely to be purchased by a customer given that he/she has purchased some other article(s), and thus give an insight into the customer purchasing behaviour. Unfortunately due to the complete nature of most of the association rule mining algorithms, association rule discovery has been plagued with the problem of generation of too many insignificant, irrelevant and obvious rules that do not add to the knowledge of the user (typically a retail-store manager). The problem is then to mine the most interesting and significant rules that would be useful to a manager.
We first discuss various approaches adopted by researchers to mitigate this problem. These include the use of interestingness measures, pruning of rules using templates, rule covers, grouping and summarization of rules. Interestingness measures try to quantify the amount of ‘interest’ that a ‘rule’ is expected to evoke in a user examining it. We discuss some of the approaches used in data mining concerning interestingness measures
We introduce an aspect of interestingness called ‘item-relatedness’ to determine the interestingness of item-pairs occurring in association rules. Association rules that contain weakly related pairs but still have a frequent occurrence in the database are the ones that are interesting. Relationships between items are captured by paths in a fuzzy taxonomy (an extension of the traditional concept hierarchy tree). Using three notions of relatedness from the fuzzy taxonomy structure, we arrive at a total relatedness measure. We demonstrate the efficacy of this total-relatedness measure on a sample taxonomy and explain intuitive correspondences between numerical results and reality. We then combine the relatedness of the item-pairs in an association rule in three different ways to arrive at ‘interestingness’. The interestingness measures are then used to rank association rules."
0 comments