Make your own free website on Tripod.com

Market Basket Analysis


1.      Objectives of the analysis

Market basket analysis (MBA) is an applied statistical methodology aimed at identifying associations between customers in a transaction or market basket (Berry and Linoff, 1997; Giudici and Passerone, 2002).  It is also called affinity analysis because it examines the mix of products that appeal to a particular customer segment (Nation・s Restaurant News, 2000).  Therefore, the primary goal of MBA is to understand the meaningful combinations of products sold profitable together. 

MBA is particular important to the retail and consumer packaged goods industry for several reasons.  First of all, the marketing environment has changed to be complex and informational.  Firms that do not find ways to increase their competitive edge will not be able to survive.  Secondly, there are distinct segments in the market and each with its own demographic profile, buying power, product preferences and media access (Cunningham et al, 1997).  It is important to identify the meaningful combinations of products that can meet the needs and wants of the distinct segments in order to make profits.  Finally, consumer products are more diverse than ever before, for example a single category of food may easily contain hundreds of competing products (Cunningham et al, 1997).  Only identifying meaningful combinations of products can meet the challenges from the increasing competition.

The objectives of the analysis might include data mining tasks as following (Berry and Linoff, 1997):

2.      Description of the data required

MBA works only with categorical variables, which are fields that take from a limited and predetermined set of values such as marital status, age, etc (Berry and Linoff, 1997).  It aims at identifying two main types of recurrent rules within transactions: association rules and sequence rules (Giudici and Passerone, 2002).  It is apparent that a correct identification of such rules depends on the availability of a considerable amount of detailed information that allows identifying customers and following their behaviors in time. 

Berry and Linoff (1997) argue that MBA can be applied to both undirected and directed data mining.  Undirected data mining problems consist of well-defined items that group together.  These problems occur commonly in the retail industry where point-of-sale transactions are the basis for the analysis.  Directed data mining problems can be run on a well-defined subset of transactions, such as transactions from new stores to find outliers in a subset of interest.  Anyway, in either ways of application, the data analyzed in an MBA usually consists of all buying transactions carried out by the customers in a certain unit of time which can be obtained from the firm・s data warehouse.  Since the data is the detailed transaction data captured at the point-of-sale, the data is generally dirty and not of very high quality.  So the data needs extensive cleansing before being a good source for the ongoing process.

Furthermore, there are different kinds of transactions for retail business: cash, credit cards, and credit payment.  The cash transactions of retail business are often anonymous, which means the store has no knowledge about specific customers.  Berry and Linoff (1997) emphasize that even limited data such as the date and time of purchase, the location of store, the cashier, and the items purchased, etc can yield interesting and actionable results.  MBA can also handle variable-length data without the need for summarization.

3.      Description of technique

Berry and Linoff (1997) identify the following steps for MBA: choose the right set of items, generate rules from all this data, and finally analyze the probabilities to determine the right rule.

Step 1: Choose the right set of items

It is crucial to choose the right set of items and right level of details for the analysis at the beginning.  Since the number of combinations can grows exponentially, Berry and Linoff (1997) suggest to use more general items initially, then to repeat the rule generation to hone in on more specific items.  For example, it is better to use items from higher levels of the taxonomy (or use of virtual items when the analysis goes beyond the taxonomy), and then find relationship with more specific items such as brand names.  A supermarket can use Candies as the higher levels of the taxonomy and Cadbury as the more specific item.  However, it is not a fixed rule as the appropriate level still depends on the item, on its importance for producing actionable results, and on its frequency in the data (Berry and Linoff, 1997).

Step 2: Generate rules from all this data

According to Berry and Linoff (1997), a rule has two parts: a condition and a result, which is usually represented as a statement.  Rules are generated from the basic probabilities available in the co-occurrence table and adjusted by the improvement formula to tell how much better a rule is at predicting the result.

Berry and Linoff (1997) state that generating association rules is a multi-step process and the general algorithm is generating the co-occurrence matrix for single items, two items, three items, and so on.  Since increasing the number of items in the combinations requires exponentially more computation, minimum support pruning is used by the algorithm to throw out a certain number of combinations that do not met some threshold criterion previous set.  The algorithm stop when the process descends the hierarchy that necessarily produces records with smaller basket counts (Kimball, 1999).  Thus, running the pruning algorithm progressively provides more focus to already relevant results.

Step 3: Analyze the probabilities to determine the right rule

Finally, it is required to analyze the probabilities of the rules generated from step 2 to determine the right rule.  The rules are easily understood which expressed in English or as a statement of SQL (Berry and Linoff, 1997).  The right rule should be able to give insight into who the customers are and why they make certain purchase, for example an association rule that expresses how tangible products and services relate to each other, how they tend to group together.  For example, if Saturday, then Candies and Soft Drinks.  That means customers usually buy Candies and Soft Drinks on Saturday for possible gathering on Saturday night or Sunday. 

Berry and Linoff (1997) identify three common types of rules that are produced by MBA: the useful, the trivial and the inexplicable rule.  The useful rule contains high quality and actionable information.  It means that it is easy to justify an action once a pattern is found from the analysis.  The above Candies and Soft Drinks is a good example of useful rules.  Trivial results are already known by anyone at all familiar with the business.  It is less useful as it may simply be measuring the success of previous marketing campaigns.  For example, customers who purchase a dress shirt are very likely to purchase a tie.  Inexplicable results seem to have no explanation and do not suggest a course of action.  Since the analysis is to understand the association structure between the sales of the different products available, it is possible to plan marketing policies in a better way with the useful rules.

4.      Expected outcomes of analysis

The expected outcomes of the analysis are to produce clear and understandable results.  It should give an insight into the merchandise by telling us which products tend to be purchased together and which are most amenable to promotion (Berry and Linoff, 1997).  So the information is actionable in two ways:

Improve design of store layouts

Placing products near each in the store for shopper・s convenience and increasing the likelihood of customer・s purchase of multiple products.  Kimball (1999) suggests that placing products far apart from each other is also useful to force the customers to traverse certain aisles or go past certain displays.

Understand combination

The retailer can anticipate a causal analysis which would reveal the purchase of a product leading to additional add-on sales of other product (Cunningham et al, 1997).  So if two products turn out to be heavily associated, they should be allocated next to each other to increase sales of both.  Also, understand which items do not sell well together is useful for the combination of products, which is called missing market basket (Kimball, 1999).


References

Berry, Michael J and Linoff, Gordon (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, New York

Cunningham, Scott; Sreedhar, Srikant and Smart, Bill (1997), .Completing a Solution for Market Basket Analysis・, Golden Books, Inc, 6 July, [Online, accessed 27 December 2002]
URL:http://anandfamily.com/marketbasket.html

Giudici, Paolo and Passerone, Gianluca (2002), .Data Mining of Association Structures to Model Consumer Behavior・, Elsevier Science, [Online, accessed 27 December 2002]
URL:http://www.sciencedirect.com

Kimball, Ralph (1999), .The Market Basket Data Mart・, Intelligent Enterprise, San Mateo, April 20, [Online, accessed 27 December 2002]
URL:http://proquest.umi.com

Nation・s Restaurant News (2000), .The Brave New World of Customer Relationship Management・, Nation・s Restaurant News, New York, 6 November, [Online, accessed 27 December 2002]
URL:http://proquest.umi.com

Bibliography

Berry, Michael J and Linoff, Gordon (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, New York

Cunningham, Scott; Sreedhar, Srikant and Smart, Bill (1997), .Completing a Solution for Market Basket Analysis・, Golden Books, Inc, 6 July, [Online, accessed 27 December 2002]
URL:http://anandfamily.com/marketbasket.html

Deck, Stewart (1999), .Mining Your Business・, Computerworld, Framingham, May 17, [Online, accessed 27 December 2002]
URL:http://proquest.umi.com

Giudici, Paolo and Passerone, Gianluca (2002), .Data Mining of Association Structures to Model Consumer Behavior・, Elsevier Science, [Online, accessed 27 December 2002]
URL:http://www.sciencedirect.com

Kimball, Ralph (1999), .The Market Basket Data Mart・, Intelligent Enterprise, San Mateo, April 20, [Online, accessed 27 December 2002]
URL:http://proquest.umi.com

Nation・s Restaurant News (2000), .The Brave New World of Customer Relationship Management・, Nation・s Restaurant News, New York, 6 November, [Online, accessed 27 December 2002]
URL:http://proquest.umi.com


Back to Information Systems and Internet Article List