WhitePaper Multicollinearity in Customer Satisfaction Research Jay L. Weiner, Ph. D. Senior Vice President, Director of Marketing Sciences Jane Tang Vice President, Marketing Sciences Editorial Board Leigh Admirand Julie Busch Tim Keiningham Design and Production Roland Clifford Barbara Day About Ipsos Loyalty Ipsos Loyalty is a global, specialized practice dedicated to helping companies improve business performance through customer satisfaction management, customer relationship management, and employee climate management.
Ipsos Loyalty provides a state-of-the-art approach to customer-driven business performance through a modular suite of innovative research tools that provides an integrated framework to identify complex global business solutions. Ipsos Loyalty is an Ipsos company, a leading global survey-based market research group. To learn more, visit www. ipsosloyalty. com. About the Contributors Jay Weiner, Ph. D. , Senior Vice President, Marketing Sciences Jay consults with many leading corporations on marketing and market research issues.
He specializes in applying advanced methods to help companies make better marketing and business decisions both domestically and globally. Jay has expertise in pricing, segmentation, customer and employee loyalty, conjoint analysis, and discrete choice analysis, in addition to solid multivariate statistical skills. Jay has his doctorate in marketing research from the University of Texas at Arlington, as well as an MBA in ? nance and commercial banking and a BBA in marketing. Jay has published and presented numerous papers on conjoint, choice, and pricing research in conference proceedings.
Jane Tang, Vice President, Marketing Sciences Jane provides expertise in analytical and methodological market research using various statistical techniques, from the basic univariate procedure to advanced multivariate. She is known for her research on analytical techniques and adaptation of techniques to the market research environment. Recently, her efforts have been concentrated in apex adjustment, Hierarchical Bayes models, discrete choice models, segmentation, and database marketing. She also serves as a sampling consultant for many project teams. Jane has a B. Sc. and a M. Sc. in statistics from the University of Manitoba.
She is also a graduate of the Sampling Program for Survey Statisticians from the Survey Research Center at the University of Michigan. –2– Multicollinearity in Customer Satisfaction Research • © 2005, Ipsos • June 20 05 Abstract This paper examines the strengths and weaknesses of four commonly used tools for modeling customer satisfaction data. Most customer satisfaction (CSAT) studies are plagued with multicollinearity, meaning that several of the independent causal variables are highly correlated, resulting in output that may cloak true drivers of satisfaction or dissatisfaction.
When compounded by the fact that most CSAT studies are tracking studies, there is a signi? cant challenge on how to model the data and deliver stable, actionable results to clients. As researchers and consultants, we must be sure that differences in results from one wave to the next are true differences in the market and not just, say, the result of a small number of respondents checking 8 instead of 7 on the last wave of a questionnaire. The six traditional CSAT modeling techniques compared in this paper are: 1.
Ordinary Least Squares 2. Shapley Value Regression 3. Penalty & Reward Analysis 4. Kruskal’s Relative Importance 5. Partial Least Squares 6. Logistic Regression The comparison begins with results that show the relative impact of multicollinearity on each technique, using a simulated data set. Then, results based on bootstrap samples pulled from this data set show the relative stability of the various techniques. Finally, a case study demonstrates how the various methods perform with a real data set. Introduction
In customer satisfaction (CSAT) studies, we often conduct driver analysis to understand the impact of explanatory variables on the overall dependent variable. That is, we need to provide the client with a list of priority items that can be improved and that will have a positive impact on overall satisfaction or customer loyalty and retention. Typically, the goal is to establish the list of priorities and relative importance of the explanatory variables, rather than try to predict the mean value of customer satisfaction if these improvements were implemented.
Since most CSAT studies are tracking studies, the results can be monitored over time to determine if the desired changes are occurring. We must be sure that changes in the results are in fact customer response to the client’s marketing efforts and not just phantoms of the analytic tool used to build the model. The latter often happens as a result of multicollinearity, which is a serious problem in many CSAT studies and presents two challenges in modeling CSAT data. The ? rst is accurately re? ecting the impact of several independent variables that are highly correlated.
The second is insuring that the results are consistent wave to wave when tracking a market over time. This paper illustrates the problems that multicollinearity present in modeling data and then compares the results from the four aforementioned modeling techniques. Multicollinearity in Customer Satisfaction Research • © 2005, Ipsos • June 20 05 –3– The Issue of Multicollinearity In market research, multicollinearity can be controlled or altogether avoided by a well-designed questionnaire. For most researchers, this is a common desire, but dif? ult to achieve. In most CSAT studies, we measure a variety of attributes that are often highly correlated with each other. For example, in evaluating the service provided by a customer call center, we frequently ask respondents to rate satisfaction with the friendliness of the operator, and also of the operator’s ability to handle the problem the ? rst time. We often see that these two attributes are highly correlated with each other. This may be due to halo effects in that most customers that are happy with the resolution of the problem will re? ct back and state that the operator was friendly. Regardless of the reason for the correlation between these two attributes, we need to ? nd a modeling tool that is capable of determining the relative contribution of each of these attributes to overall satisfaction. To set up the comparison, we created a data set with 5,000 observations that is typical of CSAT studies, where the properties of the dependent and independent measures are known. The simulated data set has two pairs of independent variables and a dependent measure (overall satisfaction).
The attributes in the ? rst pair (q1 and rq1) of independent measures are constructed to be almost perfectly correlated with each other and highly correlated with the dependent variable. The attributes in the second pair (q2 and rq2) are also highly correlated with each other, but less correlated with the dependent measure. All variables are on a 10-point rating scale. The correlation matrix is shown in Figure 1. Figure 1: Correlation Matrix for Wave One OS = Overall Satisfaction, the dependent measure
OS q1 rq1 q2 rq2 OS 1. 00 0. 63 0. 62 0. 39 0. 38 q1 0. 63 1. 00 0. 98 0. 26 0. 26 rq1 0. 62 0. 98 1. 00 0. 25 0. 25 q2 0. 39 0. 26 0. 25 1. 00 0. 98 rq2 0. 38 0. 26 0. 25 0. 98 1. 00 Impact of Multicollinearity on Ordinary Least Squares Regression A common modeling tool is ordinary least squares regression (OLS). If we regress Overall Satisfaction on q1 and q2, we ? nd the following results. Figure 2: Ordinary Least Square Regression Output Variable q1 q2 Beta Coef? cient 0. 57 0. 23 P-value