Make your own free website on Tripod.com

Decision Tree Analysis


1.      Review of the problem

Apparel retailing is one of the tough businesses in the market due to economic downturn and fast changing consumer needs and wants.  ABC Men・s Dress Shirt (ABC) has faced the same challenges and its market share is diminishing in the past few years.  Except Private Label owns 28 percent of market share, the Dress Shirt market is dominated by three major designer brands: GB (18 percent), VH (14 percent) and PRL (14 percent).  ABC gains a 5 percent of market share in 2002, which has been dropped by 3 percent comparing to 2001. 

The designer brands are fighting hard to maintain their market shares through promotions.  For example, Nau continues to gain market share through promoting up to 9 times a year at US$29.99.  PE has increased the number of promotions similar to Nau.  DK and CK have increased to 5 with PRL going to 4.  KC and ABC remain at 2 per year.  Moreover, there is pressure on the prices to retain customers.  Except PE and LC remain prices as 2001, the average prices of the overall brands are down by 4-19 percents.  The average price of ABC has also reduced by 3 percent under the pressure in the year.

As competition is increasing and the costs of attracting new customers are rising, outstanding apparel retailers all go out to retain their customers (Kotler et al, 1999).  The challenge is to find out who the loyal customers are and develop appropriate marketing mix to deliver high customer satisfaction that result in strong customer loyalty.  As ABC is continuing losing its market share, it is possible that it has targeted the wrong segments due to a change of consumer needs and wants.  Therefore, ABC is desperately required to review whether there are changes in its target segments, in order to implant marketing mix effectively and efficiently to retain its loyal customers. 

ABC is currently targeting Age Group above 35 years old with High Income more than US$75,000 per year.  To identify whether its loyal customers are same as its targeted, ABC needs to effectively handle, analyze, and interpret customer information to define correct target segments, and execute segment-specific marketing mixes.  Boone and Roehm (2002) argue that those firms that do not master information about their customers will be at competitive disadvantages.

2.      Objectives of the analysis

This is a directed knowledge discovery methodology because the result is goal-oriented.  Directed knowledge discovery is the process of finding meaningful patterns in data that explain past events in such a way that we can use the patterns to help predict future events (Berry and Linoff, 1997).  This is what we want to do: to use a set of customer data to predict or explore a specific relationship for the loyal customers so that ABC can develop appropriate marketing mix to retain its targeted customers.

Data Mining is used to solve the problem by performing four of the tasks as suggested by Berry and Linoff (1997): classification, clustering, prediction and description.  Classification examines the features of a newly presented object and assigns it to one of a predefined set of classes.  Clustering segments a heterogeneous population into a number of more homogeneous subgroups or clusters.  Prediction classifies records according to some predicted future behavior or estimated future value.  Description increases understanding and gives explanation of the data collected.  Hence, to solve the problem, the objectives of the analysis include the following:

The results of the analysis will then be tested for its robustness and used to develop more appropriate marketing mix to meet the needs of wants for the actual loyal customers of ABC. 

3.      Description of the data

We will select the target field and direct the computer to tell us how to classify, cluster, predict and describe it.  Palace (1996) suggests data can be collected internally (operational or transactional data such as sales and costs); externally (non-operational such as industry sales and macro economic data); and meta data (data about the data itself).  Since the data is about the current customers of ABC, the major source of data is the existing corporate data warehouse.  The data in the warehouse has already been cleaned and verified.  Data from multiple sources has been integrated such as surveys, transaction records, etc. 

In directed knowledge discovery, we use the pasts to build a model of the future (Berry and Linoff, 1997).  The dataset required contains information about customers who purchased at ABC stores in the past three years (2000-2002).  A customer is determined if s/he is good or poor loyalty based on her/his Age, Income and Frequency of Purchase.  In order to make the report manageable, the data is classified and clustered into subgroups as following:

It is defined that Customers with Frequent Purchase is Good loyalty; while Non-frequent is Poor loyalty.

4.      Appropriateness of technique

Decision tree analysis is used to solve the problem.  A decision tree is a model that is both predictive and descriptive (Berson and Smith, 1997).  It is a powerful model encompasses a number of specific algorithms, including Classification and Regression Trees (CART), Chi-squared Automatic Interaction Detection (CHAID), and C4.5 (Brand and Gerritsen, 1998).  It is appropriate for the problem because decision tree can be used for directed data mining particularly classification, and regression such as predicting a specific value (Berry and Linoff, 1997; Brand and Gerritsen, 1998).  It is also good for clustering and description purpose (Berry and Linoff, 1997).  There are some other strengths of decision tree analysis that would help solve the problem. 

It is called a decision tree because the resulting model is presented in the form of a tree structure.  According to Berry and Linoff (1997), a decision tree divides the records in the training set into disjoint subsets, each of which is described by a simple explicit rule as logic statement in a language such as SQL on one or more fields.  At each node in the tree, we can measure the number of records entering the node, the way those records would be classified if this is a leaf node, the percentage of records classified correctly at this node.  This visual presentation makes the decision tree model very easy to understand and assimilate (Brand and Gerritsen, 1998).  Decision tree is therefore able to generate understandable rules, and provides a clear indication of which fields are most important for prediction or classification without requiring much computation. 

Decision trees are able to handle both continuous and categorical variables.  It can be built from historical data (Berson and Smith, 1997).  Berry and Linoff (1997) state that decision tree can manage incoming data with uncertain quality as spurious results are obvious in the explicit rules.  Hence, decision-tree methods are less hindered by large numbers of fields as the tree algorithm identifies the single field that contributes the most information at each node and bases the next segment of the rule on that field alone.

Berson and Smith (1997) argue that the decision tree technology can be used for exploration of the data set and business problem.  This is often done by looking at the predictors and values that are chosen for each split of the tree.  Often these predictors provide usable insights or propose questions that need to be answered.  Consequently, from a business perspective, decision tree can be viewed as creating a segmentation of the original data set and each segment would be one of the leaves of the tree. 

To sum up, decision tree analysis is appropriate for the analysis for the following reasons:

5.      Description of analysis

A CART algorithm decision-tree method is used for the analysis.  Berry and Linoff (1997) identify the following procedure for the analysis: find the initial split, grow the full tree, measure the error rate of each node, calculate the error rate of entire tree, and finally prune the tree.

Step 1: Find the initial split

According to Berry and Linoff (1997), we need to have a training set with pre-classified records, which is a target field or dependent variable that has a known class.  The purpose is to build a tree that will allow us to assign a class to the target field of a new record based on the values of the other fields or independent variables.  CART builds a binary tree by splitting the records at each node according to a function of a single input field.  Hence, it has to decide which of the independent fields makes the best splitter that does the best job of separating the records into groups where a single class predominates.  This is an important step as Brand and Gerritsen (1998) remind that we must test and validate the model using the independent dataset prior to integrating any decision tree as a predictor.  Once accuracy has been measured on an independent dataset and is determined to be acceptable, the tree or its rules is ready to be used as a predictor. 

The dataset of ABC contains information about customers who purchased at its stores in 2000-2002.  The analysis needs to determine if a customer is Good or Poor Loyalty.  Because Customer Loyalty is what we wish to predict, it is the target field or dependent variable.  The other columns used to build the model are independent columns.  In this case, they include Age (Young or Old), Income (Low or High), and Purchase (Frequent or Non-frequent).  The Name column will be ignored because it is highly unlikely that a person・s name affects her/his purchase.  Figure 1 shows a table of independent fields for ABC.

The CART algorithm begins by analyzing the data to find the independent variable that when used as a splitting rule will result in nodes that are most different from each other with respect to the dependent variable.  The Age of customers is assumed to be the best initial splitter which has the largest number of instances in the algorithm. 

Name

Age

Income

Purchase

A

Young

Low

Frequent

B

Old

Low

Non-frequent

C

Young

High

Frequent

D

Old

High

Frequent

Figure 1: Independent fields for ABC

Step 2: Grow the full tree

The tree growing phase is an iterative process which involves splitting the data into progressively smaller subsets (Brand and Gerritsen, 1998).  Each iteration considers the data in only one node.  Once a node is split, the same process is performed on the new nodes, each of which contains a subset of the data in the parent node.  The variables are analyzed and the best split is chosen for each node in the process until no splits can be made.  Hence, the first iteration considers the root node that contains all data; while the subsequent iterations work on derivative nodes that will contain subsets of the data (Brand and Gerritsen, 1998).  The goal is to have the leaves of the tree be homogeneous as possible with respect to the prediction value (Berson and Smith, 1997). 

The tree continues to grow until it is no longer possible to find better ways to split up incoming records.  Berson and Smith (1997) and Brand and Gerritsen (1998) state that tree-building algorithms usually have several stopping rules, for example maximum tree depth, minimum number of elements in a node considered for splitting, minimum number of elements that must be in a new node, the node is completely organized into just one prediction value, etc.  It is assumed that the CART algorithm has grown the full tree for ABC as Figure 2: 

Step 3 and 4: Measure the error rate of each node and calculate the error rate of entire tree

Berry and Linoff (1997) state that at the end of the tree-growing process, every record of the training set has been assigned to some leaf of the full decision tree.  Each leaf can now be assigned a class and an error rate.  It is assumed that the CART will base on the ABC training set to calculate the probability (similar to Figure 3) and the error rate of each leaf.  The error rate of an entire decision tree is then calculated by the CART algorithm using a weighted sum of the error rates of all the leaves.   

Age

Income

Purchase

Probability

Young

High

Frequent

0.144

Young

High

Non-frequent

0.036

Young

Low

Frequent

0.294

Young

Low

Non-frequent

0.126

Old

High

Frequent

0.195

Old

High

Non-frequent

0.065

Old

Low

Frequent

0.028

Old

Low

Frequent

0.112

Figure 3: Probability of the decision tree

Step 5: Prune the tree

At this stage, we need to prune the tree to obtain more accurate predictions in the general case.  According to Apte and Weiss (1997), the CART・s pruning mechanism is an effort to get to the right sized tree that minimizes the true misclassification error estimate.  The goal of the CART・s pruning process is to find the sub-tree that produces the least true error, taking into account the size and complexity of the tree. 

Berry and Linoff (1997) identify three steps to prune the tree: identifying candidate sub-trees, evaluating sub-trees, and evaluating the best sub-tree.  To identify candidate sub-trees is to determine how far back to prune and to determine which of the many possible sub-trees will perform best.  The goal is to prune first those branches which provide the least additional predictive power per leaf.  To evaluate sub-trees is to select the tree that will best classify new data from the pool of candidate sub-trees.  To evaluate the best sub-tree is the winning sub-tree that is selected on the basis of its overall error rate when applied to the task of classifying the records in the test set.

It is assumed that the decision tree of ABC has been grown to a certain size and the CART algorithm has stopped the growing from the particular stopping criteria used in the algorithm.  The CART algorithm then checks to see if the model has been overfit to the data by repeated testing.  The tree has finished the pruning process and the following rules are generated:

IF Age = Young AND Income = High THEN Customer = Good Loyalty
IF Age = Young AND Income = Low THEN Customer = Good Loyalty
IF Age = Old AND Income = High THEN Customer = Good Loyalty
IF Age = Old AND Income = Low THEN Customer = Poor Loyalty

Berry and Linoff (1997) argue that the fewer bits required, the better the rule.  Hence, the minimum description length should be used for the model, which is the number of bits it takes to encode both the rule and the list of all exceptions to the rule.  To make it simple, we can use OR to combine certain rules to reduce the total number of rules so that there is exactly one rule for each possible value of the dependent variables (Brand and Gerritsen, 1998).  There are two combined rules for the analysis:

Rule 1:     IF Age = Young; OR IF Age = Old AND Income = High THEN Customer = Good Loyalty

Rule 2:     IF Age = Old AND Income = Low THEN Customer = Poor Loyalty

It is a good idea to retest the tree periodically to ensure that it maintains the desired accuracy.

6.      Expected outcomes

The primary output is the tree itself, which reveal two expected outcomes based on the two combined rules:

Outcome 1:  Good loyalty customers (IF Age = Young; OR IF Age = Old AND Income = High THEN Customer = Good Loyalty)

                   This rule shows that there are two groups of customers who are loyal to ABC:

(1)       Age below 34 years old and the loyalty is independent to their Income level;

(2)       Age at or above 34 years old, but Income must be at or over US$75,000 per year.

Outcome 2:   Poor loyalty customers (IF Age = Old AND Income = Low THEN Customer = Poor Loyalty)

This rule shows that customers with Age at or above 34 years old and Income is less than US$75,000 is poor loyalty to ABC.

The expected outcomes show that there is a change of targeted segments of ABC.  ABC has been targeted customers with Age at or above 35 with High Income.  The analysis reveals that there is an emergent segments of the younger Age group who look for ABC products regardless of their Income.  The current marketing mix focusing on older Age group might reduce the intention to purchase ABC products from the younger Age group.  Hence, by analyzing the customers・ data, ABC can focus marketing efforts on profitable customers and build customer retention.

Finally, Berry and Linoff (1997) emphasize the only measure that really counts is return on investment (ROI) not models.  Therefore, ABC should measure the effectiveness of the action by comparing the actual results through several areas:


References

Apte, Chidanand and Weiss, Sholom (1997), .Data Mining with Decision Trees and Decision Rules・, Elsevier Science, [Online, accessed 27 December 2002]
URL:http://www.sciencedirect.com

Berry, Michael J and Linoff, Gordon (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, New York

Berson, Alex and Smith, Stephen J (1997), Data Warehousing, Data Mining, & OLAP, McGraw Hill, US

Boone, Derrick S and Roehm, Michelle (2002), .Retail Segmentation using Artificial Neural Networks・, Elsevier Science, [Online, accessed 27 December 2002]
URL:http://www.sciencedirect.com

Brand, Estelle and Gerritsen, Rob (1998), .Decision Tree・, DBMS, Miller Freeman, Inc, February 26, [Online, accessed 27 December 2002]
URL:http://xore.com/DBMS05.html

Kotler, Philip; Armstrong, Gary; Saunders, John and Wong, Veronica (1999), Principles of Marketing, Second European Edition, Prentice-Hall, Essex

Palace, Bill (1996), .Data Mining: What is Data Mining?・ UCLA: The Anderson School, [Online, accessed 18 November 2002]
URL:http://www.anderson.ucla.edu/

Bibliography

Apte, Chidanand and Weiss, Sholom (1997), .Data Mining with Decision Trees and Decision Rules・, Elsevier Science, [Online, accessed 27 December 2002]
URL:http://www.sciencedirect.com

Berry, Michael J and Linoff, Gordon (1997), Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, New York

Berson, Alex and Smith, Stephen J (1997), Data Warehousing, Data Mining, & OLAP, McGraw Hill, US

Black, A Wade (1973), .Technological Foresting and Social Change: A Cross-Impact Model Applicable to Forecasts for Long-Range Planning・, American Elsevier Publishing Company, Inc, [Online, accessed 27 December 2002]
URL:http://www.sciencedirect.com

Boone, Derrick S and Roehm, Michelle (2002), .Retail Segmentation using Artificial Neural Networks・, Elsevier Science, [Online, accessed 27 December 2002]
URL:http://www.sciencedirect.com

Brand, Estelle and Gerritsen, Rob (1998), .Decision Tree・, DBMS, Miller Freeman, Inc, February 26, [Online, accessed 27 December 2002]
URL:http://xore.com/DBMS05.html

Kotler, Philip; Armstrong, Gary; Saunders, John and Wong, Veronica (1999), Principles of Marketing, Second European Edition, Prentice-Hall, Essex

Mind Tools (2002), .Decision Tree Analysis・, Mind Tools, [Online, accessed 27 December 2002]
URL:http://www.mindtools.com/dectree.html

Palace, Bill (1996), .Data Mining: What is Data Mining?・ UCLA: The Anderson School, [Online, accessed 18 November 2002]
URL:http://www.anderson.ucla.edu/


Back to Information Systems and Internet Article List