You are on page 1of 31

AASIGNMENTS FOR ADVANCED QUANTITATIVE METHODS ZCZF8073 ANSWER ALL SIX (6) QUESTIONS.

QUESTION 1 1. HOW DO YOU USE MULTIVARIATE DATA ANALYSIS TECHNIQUES? USE THE FOLLOWING STEPS TO GUIDE YOU. Step 1: Define the Research Problem, Objectives, and Multivariate Technique To Be Used. Step 2: Develop the Analysis Plan Step 3: Evaluate the Assumptions Underlying the Multivariate Technique Step 4: Estimate the Multivariate Model and Assess Overall Model Fit Step 5: Interpret the Variate Step 6: Apply the Diagnostics to the Results
There are numerous multivariate techniques available to the researcher and the myriad set ofissues involved in their application, it becomes apparent that the successful completion of a multivariate analysis involves more than just the selection of the correct method. Issues ranging from problem definition to a critical diagnosis of the results must be addressed. To aid the researcher or user in applying multivariate methods, a six-step approach to multivariate analysis is presented. The intent is not to provide a rigid set of procedures to follow but, instead, to provide a series of guidelines that emphasize a model-building approach. This model-building approach focuses the analysis on a well-defined research plan, starting with a conceptual model detailing the relationships to be examined. Once defined in conceptual terms, the empirical issues can be addressed, including the selection of the specific multivariate technique and the implementation issues. After obtaining significant results, we focus on their interpretation, with special attention directed toward the variate. Finally, Overview of Multivariate Methods the diagnostic measures ensure that the model is valid not only for the sample data but that it is as generalizable as possible. The following discussion briefly describes each step in this approach. This six-step model-building process provides a framework for developing, interpreting, and validating any multivariate analysis. Each researcher must develop criteria for "success" or ''failure'' at each stage, but the discussions of each technique provide guidelines whenever available. Emphasis on a model-building approach here, rather than just the specifics of each technique, should provide a broader base of model development, estimation, and interpretation that will improve the multivariate analyses of practitioner and academician alike.

Step 1: Define the Research Problem, Objectives, and MultivariateTechnique to Be Used The starting point for any multivariate analysis is to define the research problem and analysis objectives in conceptual terms before specifying any variables or measures. The role of conceptual model development, or theory, cannot be overstated. No matter whether in academic or applied research, the researcher must first view the problem in conceptual terms by defining the concepts and identifying the fundamental relationships to be investigated. A conceptual model need not be complex and detailed; instead, it can be just a simple representation of the relationships to be studied. IT a dependence relationship is proposed as the research objective, the researcher needs to specify the dependent and independent concepts. For an application of an interdependence technique, the dimensions of structure or similarity should be specified. Note that a concept (an idea or topic), rather than a specific variable, is defined in both dependence and interdependence situations. This sequence minimizes the chance that relevant concepts will be omitted in the effort to develop measures and to define the specifics of the research design. With the objective and conceptual model specified, the researcher has only to choose theappropriate multivariate technique based on the measurement characteristics of the dependent and independent variables. Variables for each concept are specified prior to the study in its design, but may be respecified or even stated in a different form (e.g., transformations or creating dummy variables) after the data have been collected. Step 2: Develop the Analysis Plan With the conceptual model established and the multivariate technique selected, attention turns to the implementation issues. The issues include general considerations such as minimum or desired sample sizes and allowable or required types of variables (metric versus nonmetric) and estimation methods. Step 3: Evaluate the Assumptions Underlying the Multivariate Technique With data collected, the first task is not to estimate the multivariate model but to evaluate its underlying assumptions, both statistical and conceptual, that substantially affect their ability to represent multivariate relationships. For the techniques based on statistical inference, the assumptions of multivariate normality, linearity, independence of the error terms, and equality of variances must all be met Each technique also involves a series of conceptual assumptions dealing with such issues as model formulation and the types of relationships represented. Before any model estimation is attempted, the researcher must ensure that both statistical and conceptual assumptions are met Step 4: Estimate the Multivariate Model and Assess Overall Model Fit With the assumptions satisfied, the analysis proceeds to the actual estimation of the multivariate model and an assessment of overall model fit. In the estimation process, the researcher may choose among options to meet specific characteristics of the data (e.g., use of covariates in MANOVA)or to maximize the fit to the data (e.g., rotation off actors or discriminant functions). After the model is estimated, the overall model fit is evaluated to ascertain whether it achieves acceptable levels onstatistical criteria (e.g., level of significance), identifies the proposed relationships, and achievespractical significance.

Many times, the model will be respecified in an attempt to achieve better levels of overall fit and/or explanation. In all cases, however, an acceptable model must be obtained before proceeding. No matter what level of overall model fit is found, the researcher must also determine whether the results are unduly affected by any single or small set of observations that indicate the results may be unstable or not generalizable. Ill-fitting observations may be identified as outliers, influential observations, or other disparate results (e.g., single-member clusters or seriously misclassified cases in discriminant analysis). Step 5: Interpret the Variate(s) With an acceptable level of model fit, interpreting the variate(s) reveals the nature of the multivariate relationship. The interpretation of effects for individual variables is made by examining the estimated coefficients (weights) for each variable in the variate. Moreover, some techniques also estimate multiple variates that represent underlying dimensions of comparison or association. The interpretation may lead to additional respecifications of the variables and/or model formulation, wherein the model is reestimated and then interpreted again. The objective is to identify empirical evidence of multivariate relationships in the sample data that can be generalized to the total population. Step 6: Validate the Multivariate Model Before accepting the results, the researcher must subject them to one final set of diagnostic analyses that assess the degree of generalizability of the results by the available validation methods. The attempts to validate the model are directed toward demonstrating the generalizability of the results to the total population. These diagnostic analyses add little to the interpretation of the results but can be viewed as "insurance" that the results are the most descriptive of the data, yet generalizable to the population.

2.

WHAT ARESOME GENERAL GUIDELINES FOR MULTIVARIATE ANALYSIS? USE THE FOLLOWING STEPS TO GUIDE YOU Establish Practical Significance as Well as Statistical Significance Sample Size Affects All Results Know Your Data Strive for Model Parsimony Look at Your Errors Validate Your Results

Multivariate analyseshave powerful analytical and predictive capabilities. Thestrengths of accommodating multiple variables and relationshipscreate substantial complexity in the results andtheir interpretation. Faced with this complexity, theresearcher is cautioned to use multivariate methods onlywhen the requisite conceptual foundation to support theselected technique has been developed. The followingguidelines represent a ''philosophy of multivariate analysis"that should be followed in their application:

Establish Practical Significance as Well as Statistical Significance The strength of multivariate analysis is its seemingly magical ability of sorting through a myriadnumber of possible alternatives and finding those with statistical significance. However, with thispower must come caution. Many researchers become myopic in focusing solely on the achieved significanceof the results without understanding their interpretations, good or bad. A researcher mustinstead look not only at the statistical significance of the results but also at their practical significance. Practical significance asks the question, "So what?" For any managerial application, theresults must offer a demonstrable effect that justifies action. In academic settings, research isbecoming more focused not only on the statistically significant results but also on their substantiveand theoretical implications, which are many times drawn from their practical significance. For example, a regression analysis is undertaken to predict repurchase intentions, measured asthe probability between 0 and 100 that the customer will shop again with the firm. The study is conductedand the results come back significant at the .05 significance level. Executives rush toembrace the results and modify firm strategy accordingly. What goes unnoticed, however, is thateven though the relationship was significant, the predictive ability was poor-so poor that the estimateof repurchase probability could vary by as much as 20 percent at the .05 significance level.The "statistically significanf' relationship could thus have a range of error of 40 percentage points! A customer predicted to have a 50 percent chance of return could really have probabilities from 30percent to 70 percent, representing unacceptable levels upon which to take action. Had researchersand managers probed the practical or managerial significance of the results, they would have concludedthat the relationship still needed refinement before it could be relied upon to guide strategyin any substantive sense. Recognize That Sample Size Affects All Results The discussion of statistical power demonstrated the substantial impact sample size plays in achievingstatistical significance, both in small and large sample sizes. For smaller samples, the sophisticationand complexity of the multivariate technique may easily result in either (1) too little statistical powerfor the test to realistically identify significant results or (2) too easily "overfitting" the data such thatthe results are artificially good because they fit the sample yet provide no generalizability. A similar impact also occurs for large sample sizes, which as discussed earlier, can make thestatistical tests overly sensitive. Any time sample sizes exceed 400 respondents, the researchershould examine all significant results to ensure they have practical significance due to the increasedstatistical power from the sample size.

Sample sizes also affect the results when the analyses involve groups of respondents, such asdiscriminant analysis or MANOVA. Unequal sample sizes among groups influence the results andrequire additional interpretation or analysis. Thus, a researcher or user of multivariate techniquesshould always assess the results in light of the sample used in the analysis.

Know Your Data Multivariate techniques, by their very nature, identify complex relationships that are difficult to representsimply. As a result, the tendency is to accept the results without the typical examination one undertakesin univariate and bivariate analyses (e.g., scatterplots of correlations and boxplots of meancomparisons). Such shortcuts can be a prelude to disaster, however. Multivariate analyses require aneven more rigorous examination of the data because the influence of outliers, violations of assumptions,and missing data can be compounded across several variables to create substantial effects. A wide-ranging set of diagnostic techniques enables discovery of these multivariate relationshipsin ways quite similar to the univariate and bivariate methods. The multivariate researcher musttake the time to utilize these diagnostic measures for a greater understanding of the data and thebasic relationships that exist With this understanding, the researcher grasps not only "the big picture;'but also knows where to look for alternative formulations of the original model that can aid inmodel fit, such as nonlinear and interactive relationships. Strive for Model Parsimony Multivariate techniques are designed to accommodate multiple variables in the analysis. This feature,however, should not substitute for conceptual model development before the multivariate techniquesare applied. Although it is always more important to avoid omitting a critical predictor variable,termed specification error, the researcher must also avoid inserting variables indiscriminately andletting the multivariate technique "sort out" the relevant variables for two fundamental reasons: 1. Irrelevant variables usually increase a technique's ability to fit the sample data, but at theexpense of overfitting the sample data and making the results less generalizable to the population. Even though irrelevant variables typically do not bias the estimates of the relevant variables,they can mask the true effects due to an increase in multicollinearity. Multicollinearity representsthe degree to which any variable's effect can be predicted or accounted for by the othervariables in the analysis. As multicollinearity rises, the ability to define any variable's effect isdiminished. The addition of irrelevant or marginally significant variables can only increasethe degree of multicollinearity, which makes interpretation of all variables more difficult.

2.

Thus, including variables that are conceptually not relevant can lead to several potentially harmfuleffects, even if the additional variables do not directly bias the model results.

Look at Your Errors Even with the statistical prowess of multivariate techniques, rarely do we achieve the best predictionin the first analysis. The researcher is then faced with the question, ''Where does one go from here?" The best answer is to look at the errors in prediction, whether they are the residuals from regressionanalysis, the misclassification of observations in discriminant analysis, or outliers in cluster analysis.

In each case, the researcher should use the errors in prediction not as a measure of failure ormerely something to eliminate, but as a starting point for diagnosing the validity of the obtainedresults and an indication of the remaining unexplained relationships. Validate Your Results The ability of multivariate analyses to identify complex interrelationships also means that resultscan be found that are specific only to the sample and not generalizable to the population. Theresearcher must always ensure there are sufficient observations per estimated parameter to avoid "overfitting" the sample, as discussed earlier. Just as important, however, are the efforts to validatethe results by one of several methods: 1. Splitting the sample and using one subsample to estimate the model and the second subsampIe to estimate the predictive accuracy. Gathering a separate sample to ensure that the results are appropriate for other samples. Employing a bootstrapping technique [6], which validates a multivariate model by drawinga large number of subsamples, estimating models for each subsample, and then determiningthe values for the parameter estimates from the set of models by calculating the mean of eachestimated coefficient across all the subsample models. This approach also does not rely onstatistical assumptions to assess whether a parameter differs from zero (Le., Are the estimatedcoefficients statistically different from zero or not?). Instead it examines the actual valuesfrom the repeated samples to make this assessment.

2.

3.

Whenever a multivariate technique is employed, the researcher must strive not only to estimate asignificant model but to ensure that it is representative of the population as a whole. Remember, theobjective is not to find the best ''fit'' just to the sample data but instead to develop a model that bestdescribes the population as a whole.

3.

WHAT ARE SOME RULES IN SELECTING A MULTIVARIATE TECHNIQUE?

Before launching into an analysis technique, it is important to have a clear understanding of the form and quality of the data. The form of the data refers to whether the data are nonmetric or metric. The quality of the data refers to how normally distributed the data are. The first few techniques discussed are sensitive to the linearity, normality, and equal variance assumptions of the data. Examinations of distribution, skewness, and kurtosis are helpful in examining distribution. Also, it is important to understand the magnitude of missing values in observations and to determine whether to ignore them or impute values to the missing observations. Another data quality measure is outliers, and it is important to determine whether the outliers should be removed. If they are kept, they may cause a distortion to the data; if they are eliminated, they may help with the assumptions of normality.

The key is to attempt to understand what the outliers represent.The multivariate techniques can be classified based on three judgments the researcher must make about the research objective and nature of the data: (1) Can the variables be divided into independent and dependent classifications based on someLtheory? (2) If they can, how many variables are treatedas dependent in a single analysis? and (3) How are the variables, both dependent and independent, measured? Selection of the appropriate multivariate technique depends on the answers to these three questions.

RULES OF THUMB NO. 1

Statistical Power Analysis Researchers should design studies to achieve a power level of .80 at the desired significance level. More stringent significance levels (e.g., .01 instead of .05) require larger samples to achieve the desired power level. Conversely, power can be increased by choosing a less stringent alpha level (e.g., .10 instead of .05). Smaller effect sizes require larger sample sizes to achieve the desired power. An increase in power is most likely achieved by increasing the sample size.

QUESTION 2 1. WHAT IS CANONICAL CORRELATION ANALYSIS?

Canonical Correlation Analysis is one of established and emerging technique for Multivariate analysis. Multivariate analysis is an ever-expandingset of techniques for data analysis that encompasses a wide range of possible research situations. The most flexible of the multivariate techniques, canonical correlation simultaneously correlates several independent variables and several dependent variables. This powerful technique utilizes metric independent variables, unlike MANOVA, such as sales, satisfaction levels, and usage levels. It can also utilize nonmetric categorical variables. This technique has the fewest restrictions of any of the multivariate techniques, so the results should be interpreted with caution due to the relaxed assumptions. Often, the dependent variables are related, and the independent variables are related, so finding a relationship is difficult without a technique like canonical correlation.

2.

WHY DO WE USE CANONICAL CORRELATION ANALYSIS?

A close relationship exists between the various dependence procedures, which can be viewed as afamily of techniques. Table below defines the various multivariate dependence techniques in terms of thenature and number of dependent and independent variables. As we can see, canonical correlation canbe considered to be the general model upon which many other multivariate techniques are based,because it places the least restrictions on the type and number of variables in both the dependent andindependent variates. As restrictions are placed on the variates, more precise conclusions can be reachedbased on the specific scale of data measurement employed. Thus, multivariate techniques range fum the general method of canonical analysis to the specialized technique of structural equation modelling.

3.

WHEN DO YOU USE CANONICAL CORRELATION ANALYSIS?

Canonical correlation analysis can be viewed as a logical extension of multiple regression analysis. Recall Table above ( Question 2 ) that multiple regression analysis involves a single metric dependent variable and several metric independent variables. With canonical analysis the objective is to correlate simultaneously several metric dependent variables and several metric independent variables. Whereas multiple regression involves a single dependent variable, canonical correlation involves multiple dependent variables. The underlying principle is to develop a linear combination of each set of variables (both independent and dependent) in a manner that maximizes the correlation between the two sets. Stated in a different manner, the procedure involves obtaining a set of weights for the dependent and independent variables that provides the maximum simple correlation between the set of dependent variables and the set of independent variables. Assume a company conducts a study that collects information on its service quality based onanswers to 50 metrically measured questions. The study uses questions from published service quality research and includes benchmarking information on perceptions of the service quality of ''world-class companies" as well as the company for which the research is being conducted. Canonical correlation could be used to compare the perceptions of the world-class companies on the 50 questions with the perceptions of the company. The research could then conclude whether the perceptions of the company are correlated with those of world-class companies. The technique would provide information on the overall correlation of perceptions as well as the correlation between each of the 50 questions.

4.

HOW DO YOU USE CANONICAL CORRELATION ANALYSIS? USE THE FOLLOWING STEPS TO GUIDE YOU. Stage 1: objectives of canonical correlation analysis (CCA) Stage 2: designing a canonical correlation analysis Stage 3: assumptions in canonical correlation analysis Stage 4: deriving the canonical functions and assessing overall fit Stage 5: interpreting the canonical functions Stage 6: validation and diagnosis

Canonical correlation analysis is the most generalized member of the family ofmultivariate statistical techniques. It is directly related to several dependence methods. Similar to regression, canonical correlations goal is to quantify the strength of the relationship, in this case between the two sets of variables (independent and dependent). It corresponds to factor analysis in the creation of composites of variables. It also resembles discriminant analysis in its ability to determine independent dimensions (similar to discriminant functions) for each variable set, in this situation with the objective of producing the maximum correlation between the dimensions. Thus, canonical correlation identifies the optimum structure ordimensionality of each variable set that maximizes the relationship betweenindependent and dependent variable sets.
9

Canonical correlation analysis deals with the association between composites of sets of multiple dependent and independent variables. In doing so, it develops a number of independent canonical functions that maximize the correlation between the linear composites, also known as canonical variates, which are sets of dependent and independent variables. Each canonical function is actually based on the correlation between two canonical variates, one variate for the dependent variables and one for the independent variables. Another unique feature of canonical correlation is that the variates are derived to maximize their correlation. Moreover, canoniccorrelation does not stop with the derivation of a single relationship between the sets of variables. Instead, a number of canonical functions (pairs of canonical variates) may be derived. The following discussion of canonical correlation analysis is organized around a six-stage model-building process. The steps in this process include (1) specifying the objectives of canonical correlation, (2) developing the analysis plan, (3) assessing the assumptions underlying canonical correlation, (4) estimating the canonical model and assessing overall model fit, (5) interpreting the canonical variates, and (6) validating the model.

Stage 1: Objectives of Canonical Correlation Analysis The appropriate data for canonical correlation analysis are two sets of variables. We assume that each set can be given some theoretical meaning, at least to the extent that one set could be defined as the independent variables and the other as the dependent variables. Once this distinction has been made, canonical correlation can address awide range of objectives. These objects may be any or all of the following: 1. Determining whether two sets of variables (measurements made on the same objects) are independent of one another or, conversely, determining the magnitude of the relationships that may exist between the two sets. Deriving a set of weights for each set of dependent and independent variables so that the linear combinations of each set are maximally correlated. Additionallinear functions that maximize the remaining correlation are independent of the preceding set(s) of linear combinations. Explaining the nature of whatever relationships exist between the sets of dependent and independent variables, generally by measuring the relative contribution of each variable to the canonical functions (relationships) that are extracted.

2.

3.

The inherent flexibility of canonical correlation in terms of the number and types of variables handled, both dependent and independent, makes it a logical candidate for many of the more complex problems addressed with multivariate techniques. Stage 2: Designing a CanonicalCorrelation Analysis As the most general form of multivariate analysis, canonical correlation analysis shares basic implementation issues common to all multivariate techniques.Discussions on the impact of measurement error, the types of variables, and theirtransformations that can be included are relevant to canonical correlation analysis aswell.

10

The issues of the impact of sample size (both small and large) and the necessity for a sufficient number of observations per variable are frequently encountered with canonical correlation. Researchers are tempted to include many variables in both theindependent and dependent variable set, not realizing the implications for sample size. Sample sizes that are very small will not represent the correlations well, thus obscuring any meaningful relationships. Very large samples will have a tendency to indicate statistical significance in all instances, even where practical significance is not indicated. The researcher is also encouraged to maintain at least 10 observations per variable to avoid overfitting the data. The classification of variables as dependent or independent is of little importance for the statistical estimation of the canonical functions, because canonical correlation analysis weights both variates to maximize the correlation and places no particular emphasis on either variate. Yet because the technique produces variates to maximize the correlation between them, a variable in either set relates to all other variables in both sets. This allows the addition or deletion of a single variable to affect the entire solution, particularly the other variate. The composition of each variate, either independent or dependent, becomes critical. A researcher must have conceptually linked sets of the variables before applying canonical correlation analysis. This makes the specification of dependent versus independent variates essential to establishing a strong conceptual foundation for the variables.

Stage 3: Assumptions in Canonical Correlation The generality of canonical correlation analysis also extends to its underlying statistical assumptions. The assumption of linearity affects two aspects of canonical correlation results. First, the correlation coefficient between any two variables is based on a linear relationship. If the relationship is nonlinear, then one or both variables should be transformed, if possible. Second, the canonical correlation is the linear relationship between the variates. If the variates relate in a nonlinear manner, the relationship will not be captured by canonical correlation. Thus, while canonicalcorrelation analysis is the most generalized multivariate method, it is still constrained to identifying linear relationships. Canonical correlation analysis can accommodate any metric variable without the strict assumption of normality. Normality is desirable because it standardizes a distribution to allow for a higher correlation among the variables. But in the strictest sense, canonical correlation analysis can accommodate even nonnormal variables if the distributional form (e.g., highly skewed) does not decrease the correlation with other variables. This allows for transformed nonmetric data (in the form of dummy variables) to be used as well. However, multivariate normality is required for thestatistical inference test of the significance of each canonical function. Because tests for multivariate normality are not readily available, the prevailing guideline is to ensure that each variable has univariate normality. Thus, although normality is not strictly required, it is highly recommended that all variables be evaluated for normality and transformed if necessary. Homoscedasticity, to the extent that it decreases the correlation between variables, should also be remedied. Finally, multicollinearity among either variable set will confound the ability of the technique to isolate the impact of any single variable, making interpretation less reliable.

11

Stage 4: Deriving the Canonical Functions and Assessing Overall Fit The first step of canonical correlation analysis is to derive one or more canonical functions. Each function consists of a pair of variates, one representing the independent variables and the other representing the dependent variables. The maximum number of canonical variates (functions) that can be extracted from the setsof variables equals the number of variables in the smallest data set, independent or dependent. For example, when the research problem involves five independent variables and three dependent variables, the maximum number of canonical functions that can be extracted is three. Deriving Canonical Functions The derivation of successive canonical variates is similar to the procedure used with unrotated factor analysis. The first factor extracted accounts for the maximum amount of variance in the set of variables, then the second factor is computed so that it accounts for as much as possible of the variance not accounted for by the first factor, and so forth, until all factors have been extracted. Therefore, successive factors are derived from residual or leftover variance from earlier factors. Canonical correlation analysis follows a similar procedure but focuses on accounting for the maximum amount of the relationship between the two sets of variables, rather than within a single set. The result is that the first pair of canonical variates is derived so as to have the highest intercorrelation possible between the two sets of variables. The second pair of canonical variates is then derived so that it exhibits the maximum relationship between the two sets of variables (variates) not accounted for by the first pair of variates. In short, successive pairs of canonicalvariates are based on residual variance, and their respective canonical correlations (which reflect the interrelationships between the variates) become smaller as each additional function is extracted. That is, the first pair of canonical variates exhibits the highest intercorrelation, the next pair the secondhighest correlation, and so forth. One additional point about the derivation of canonicalvariates: as noted, successive pairs of canonical variates are based on residual variance. Therefore, each of the pairs of variates is orthogonal and independent of all other variates derived from the same set of data. The strength of the relationship between the pairs of variates is reflected by the canonical correlation. When squared, the canonical correlation represents the amount of variance in one canonical variate accounted for by the other canonical variate. This also may be called the amount of shared variance between the two canonical variates. Squared canonical correlations are called canonical roots or eigenvalues. Which Canonical Functions Should Be Interpreted? As with research using other statistical techniques, the most common practice is to analyze functions whose canonical correlation coefficients are statistically significant beyond some level, typically .05 or above. If other independent functions are deemed insignificant, these relationships among the variables are not interpreted. Interpretation of the canonical variates in a significant function is based on the premise that variables in each set that contribute heavily to shared variances for these functions are considered to be related to each other.

12

The authors believe that the use of a single criterion such as the level of significance is too superficial. Instead, they recommend that three criteria be used in conjunction with one another to decide which canonical functions should be interpreted. The three criteria are (1) level of statistical significance of the function, (2) magnitude of the canonical correlation, and (3) redundancy measure for the percentage of variance accounted for from the two data sets. Level of Significance The level of significance of a canonical correlation generally considered to be the minimum acceptable for interpretation is the .05 level, which (along with the .01 level) has become the generally accepted level for considering a correlation coefficient statistically significant. This consensus has developed largely because of the availability of tables for these levels. These levels are not necessarily required in all situations, however, and researchers from various disciplines frequently must rely on results based on lower levels of significance. The most widely used test, and the one normally provided by computer packages, is the F statistic, based on Raos approximation [3]. In addition to separate tests of each canonical function, a multivariate test of all canonical roots can also be used for evaluating the significance of canonical roots. Many of the measures for assessing the significance of discriminant functions, including Wilks lambda, Hotellings trace, Pillais trace, and Roys gcr, are also provided. Magnitude of the Canonical Relationships The practical significance of the canonical functions, represented by the size of the canonical correlations, also should be considered when deciding which functions to interpret. No generally accepted guidelines have been established regarding suitable sizes for canonical correlations. Rather, the decision is usually based on the contribution of the findings to better understanding of the research problem being studied. It seems logical that the guidelines suggested for significant factor loadings in factor analysis might be useful with canonical correlations, particularly when oneconsiders that canonical correlations refer to the variance explained in the canonical variates (linear composites), not the original variables. Redundancy Measure of Shared Variance Recall that squared canonical correlations (roots) provide an estimate of the shared variance between the canonical variates. Although this is a simple and appealing measure of the shared variance, it may lead to some misinterpretation because the squared canonical correlations represent the variance shared by the linear composites of the sets of dependent and independent variables, and not the variance extracted from the sets of variables [1]. Thus, a relatively strong canonical correlation may be obtained between two linear composites (canonical variates), even though these linear composites may not extract significant portions of variance from their respective sets of variables. Because canonical correlations may be obtained that are considerably larger than previously reported bivariate and multiple correlation coefficients, there may be a temptation to assume that canonical analysis has uncovered substantial relationships of conceptual and practical significance. Before such conclusions are warranted,however, further analysis involving

13

measures other than canonical correlations mustbe undertaken to determine the amount of the dependent variable variance accounted for or shared with the independent variables [7]. To overcome the inherent bias and uncertainty in using canonical roots (squared canonical correlations) as a measure of shared variance, a redundancy index has been proposed [8]. It is the equivalent of computing the squared multiple correlation coefficient between the total independent variable set and each variable in the dependent variable set, and then averaging these squared coefficients to arrive at an average R2. This index provides a summary measure of the ability of a set of independent variables (taken as a set) to explain variation in the dependent variables (taken one at a time). As such, the redundancy measure is perfectly analogous to multiple regressions R2 statistic, and its value as an index is similar. The Stewart-Love index of redundancy calculates the amount of variance in one set of variables that can be explained by the variance in the other set. This index serves as a measure of accounted-for variance, similar to the R2 calculation used in multiple regression. The R2 represents the amount of variance in the dependent variable explained by the regression function of the independent variables. In regression, the total variance in the dependent variable is equal to 1, or 100 percent. Remember that canonical correlation is different from multiple regression in that it does not deal with a single dependent variable but has a composite of the dependent variables, and this composite has only a portion of each dependent variables total variance. For this reason, we cannot assume that 100 percent of the variance in the dependent variable set is available to be explained by the independent variable set. The set of independent variables can be expected to account only for the shared variance in the dependent canonical variate. For this reason, the calculation of the redundancy index is a threestep process. The first step involves calculating the amount of shared variance from the set of dependent variables included in the dependent canonical variate. The second step involves calculating the amount of variance in the dependent canonical variate that can be explained by the independent canonical variate. The final step is to calculate the redundancy index, found by multiplying these two components. Step 1: The Amount of Shared Variance. To calculate the amount of shared variance in the dependent variable set included in the dependent canonical variate, let us first consider how the regression R2 statistic is calculated. R2 is simply the square of the correlation coefficient R, which represents the correlation between the actual dependent variable and the predicted value. In the canonical case, we are concerned with correlation between the dependent canonical variate and each of the dependent variables. Such information can be obtained from the canonical loadings (L1), which represent the correlation between each input variable and its own canonical variate (discussed in more detail in the following section). By squaring each of the dependent variable loadings (Li 2 ), one may obtain a measure of the amount of variation in each of the dependent variables explained by the dependent canonical variate. To calculate the amount of shared variance explained by the canonical variate, a simple average of the squared loadings is used. Step 2: The Amount of Explained Variance. The second step of the redundancy process involves the percentage of variance in the dependent canonical variate that can be explained by the independent canonical variate. This is simply the squared correlation between the independent canonical variate and the dependent canonicalvariate, which is

14

otherwise known as the canonical correlation. The squaredcanonical correlation is commonly called the canonical R2. Step 3: The Redundancy Index. The redundancy index of a variate is then derived by multiplying the two components (shared variance of the variate multiplied by the squared canonical correlation) to find the amount of shared variance that can be explained by each canonical function. To have a high redundancy index, one must have a high canonical correlation and a high degree of shared variance explained by the dependent variate. A high canonical correlation alone does not ensure a valuable canonical function. Redundancy indices are calculated for both the dependent and the independent variates, although in most instances the researcher is concerned only with the variance extracted from the dependent variable set, which provides a much more realistic measure of the predictive ability of canonical relationships. The researcher should note that while the canonical correlation is the same for both variates in the canonical function, the redundancy index will most likely vary between the two variates, as each will have a differing amount of shared variance. What is the minimum acceptable redundancy index needed to justify the interpretation of canonical functions? Just as with canonical correlations, no generally accepted guidelines have been established. The researcher must judge each canonical function in light of its theoretical and practical significance to the research problem being investigated to determine whether the redundancy index is sufficient to justifyinterpretation. A test for the significance of the redundancy index has been developed [2], although it has not been widely utilized.

Stage 5: Interpreting the Canonical Variate If the canonical relationship is statistically significant and the magnitudes of the canonical root and the redundancy index are acceptable, the researcher still needs to make substantive interpretations of the results. Making these interpretations involves examining the canonical functions to determine the relative importance of each of the original variables in the canonical relationships. Three methods have been proposed: (1) canonical weights (standardized coefficients), (2) canonical loadings (structure correlations), and (3) canonical cross-loadings. Canonical Weights The traditional approach to interpreting canonical functions involves examining the sign and the magnitude of the canonical weight assigned to each variable in its canonical variate. Variables with relatively larger weights contribute more to the variates, and vice versa. Similarly, variables whose weights have opposite signs exhibit an inverse relationship with each other, and variables with weights of the same sign exhibit a direct relationship. However, interpreting the relative importance or contribution of a variable by its canonical weight is subject to the same criticisms associated with the interpretation of beta weights in regression techniques. For example, a small weight may mean either that its corresponding variable is irrelevantin determining a relationship or that it has been partialed out of the

15

relationship because of a high degree of multicollinearity. Another problem with the use of canonical weights is that these weights are subject to considerable instability (variability) from one sample to another. This instability occurs because the computational procedure for canonical analysis yields weights that maximize the canonical correlations for a particular sample of observed dependent and independent variable sets [7]. These problems suggest considerable caution in using canonical weights to interpret the results of a canonical analysis. Canonical Loadings Canonical loadings have been increasingly used as a basis for interpretation because of the deficiencies inherent in canonical weights. Canonical loadings, also called canonical structure correlations, measure the simple linear correlation between an original observed variable in the dependent or independent set and the sets canonical variate. The canonical loading reflects the variance that the observed variable shares with the canonical variate and can be interpreted like a factor loading in assessing the relative contribution of each variable to each canonical function. The methodology considers each independent canonical function separately and computes the within-set variable-to-variate correlation. The larger the coefficient, the more important it is in deriving the canonical variate. Also, the criteria for determining the significance of canonical structure correlations are the same as with factor loadings in factor analysis Canonical loadings, like weights, may be subject to considerable variability from one sample to another. This variability suggests that loadings, and hence the relationships ascribed to them, may be sample-specific, resulting from chance or extraneous factors [7]. Although canonical loadings are considered relatively more valid than weights as a means of interpreting the nature of canonical relationships, the researcher still must be cautious when using loadings for interpreting canonical relationships, particularly with regard to the external validity of the findings. Canonical Cross-Loadings The computation of canonical cross-loadings has been suggested as an alternative to canonical loadings [4]. This procedure involves correlating each of the original observed dependent variables directly with the independent canonical variate, and vice versa. Recall that conventional loadings correlate the original observed variables with their respective variates after the two canonical variates (dependent and independent) are maximally correlated with each other. This may also seem similar to multiple regression, but it differs in that each independent variable, for example, is correlated with the dependent variate instead of a single dependent variable. Thus cross-loadings provide a more direct measure of the dependentindependent variable relationships by eliminating an intermediate step involved in conventional loadings. Some canonical analyses do not compute correlations between the variables and the variates. In such cases the canonical weights are considered comparable but not equivalent for purposes of our discussion.

16

Which Interpretation Approach to Use Several different methods for interpreting the nature of canonical relationships have been discussed. The question remains, however: Which method should the researcher use? Because most canonical problems require a computer, the researcher frequently must use whichever method is available in the standard statistical packages. The cross-loadings approach is preferred, and it is provided by many computer programs, but if the crossloadings are not available, the researcher is forced either to compute the cross-loadings by hand or to rely on the other methods of interpretation. The canonical loadings approach is somewhat more representative than the use of weights, just as was seen with factor analysis and discriminant analysis. Therefore, whenever possible the loadings approach is recommended as the best alternative to the canonical cross-loadings method.

Stage 6: Validation and Diagnosis As with any other multivariate technique, canonical correlation analysis should be subjected to validation methods to ensure that the results are not specific only to the sample data and can be generalized to the population. The most direct procedure is tocreate two subsamples of the data (if sample size allows) and perform the analysis on each subsample separately. Then the results can be compared for similarity ofcanonical functions, variate loadings, and the like. If marked differences are found, the researcher should consider additional investigation to ensure that the final results are representative of the population values, not solely those of a single sample. Another approach is to assess the sensitivity of the results to the removal of a dependent and/or independent variable. Because the canonical correlation procedure maximizes the correlation and does not optimize the interpretability, the canonical weights and loadings may vary substantially if one variable is removed from either variate. To ensure the stability of the canonical weights and loading, the researcher should estimate multiple canonical correlations, each time removing a different independent or dependent variable. Although there are few diagnostic procedures developed specifically for canonical correlation analysis, the researcher should view the results within the limitations of the technique. Among the limitations that can have the greatest impact on the results and their interpretation are the following: 1. The canonical correlation reflects the variance shared by the linear composites of the sets of variables, not the variance extracted from the variables. 2. Canonical weights derived in computing canonical functions are subject to a great deal of instability. 3. Canonical weights are derived to maximize the correlation between linear composites, not the variance extracted. 4. The interpretation of the canonical variates may be difficult because they are calculated to maximize the relationship, and there are no aids for interpretation, such as rotation of variates, as seen in factor analysis.

17

5. It is difficult to identify meaningful relationships between the subsets of independent and dependent variables because precise statistics have not yet been developed to interpret canonical analysis, and we must rely on inadequate measures such as loadings or crossloadings [7]. These limitations are not meant to discourage the use of canonical correlation. Rather, they are pointed out to enhance the effectiveness of canonical correlation as a research tool.

QUESTION 3

ANSWER THE FOLLOWING QUESTIONS ON LOGISTIC REGRESSION ON LOGISTIC REGRESSION. 1. HOW WOULD YOU DIFFERENTIATE BETWEEN MULTIPLE DISCRIMINANT ANALYSIS, REGRESSION ANALYSIS, LOGISTIC REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE?

Multiple discriminant analysis has widespread application in situations in which the primaryobjective is to identify the group to which an object (e.g., person, firm, or product) belongs. Potential applications include predicting the success or failure of a new product, deciding whether a student should be admitted to graduate school, classifying students as to vocational interests, determining the category of credit risk for a person, or predicting whether a firm will be successful. In each instance, the objects fall into groups, and the objective is to predict and explain the bases for each object's group membership through a set of independent variables selected by the researcher. The application and interpretation of discriminant analysis is much the same as in regression analysis.That is, the discriminant function is a linear combination (variate) of metric measurements for two or more independent variables and is used to describe or predict a single dependent variable. The key difference is that discriminant analysis is appropriate for research problems in which the dependent variable is categorical (nominal or nonmetric), whereas regression is utilized when the dependent variable is metric. As discussed earlier, logistic regression is a variant of regression with many similarities except for the type of dependent variable. Discriminant analysis is also comparable to ''reversing'' multivariate analysis of variance (MANOVA). In discriminant analysis, the single dependent variable is categorical, and the independent variables are metric. The opposite is true of MANOVA, which involves metric dependentvariables and categorical independent variable(s). The two techniques both use the same statistical measures of overall model. Logistic regression is similar to multipleregression. It has straightforward statistical tests, similar approaches to incorporating metric and nonmetric variables and nonlinear effects, and a wide range of diagnostics. Thus, for these and more technical reasons, logistic regression is equivalent to two-group discriminant analysis and may be more suitable in many situations.

18

Analysis of variance (ANOVA) Statistical technique used to determine whether samples fromtwo or more groups come from populations with equal means (i.e., Do the group means differ significantly?). Analysis of variance examines one dependent measure, whereas multivariate analysis of variance compares group differences on two or more dependent variables.The concept of multivariate analysis of variance was introduced more than 70 years ago by Wilks [26]. However, it was not until the development of appropriate test statistics with tabled distributions and the more recent widespread availability of computer programs to compute these statistics that MANOVA became a practical tool for researchers. Both ANOVA and MANOVA are particularly useful when used in conjunction with experimentaldesigns; that is, research designs in which the researcher directly controls or manipulates one or more independent variables to determine the effect on the dependent variable(s). ANOVA and MANOVA provide the tools necessary to judge the observed effects (Le., whether an observed difference is due to a treatment effect or to random sampling variability). However, MANOVA has a role in nonexperimental designs (e.g., survey research) where groups of interest (e.g., gender, purchaser/nonpurchaser) are defined and then the differences on any number of metric variables (e.g., attitudes, satisfaction, purchase rates) are assessed for statistical significance.

2.

WHEN WOULD YOU EMPLOY LOGISTIC REGRESSION RATHER THAN DISCRIMINANT ANALYSIS? WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF THE DECISION?

Logistic regression-is also appropriate for handling research questionswhere the dependent variable is nonmetric. However, logistic regression is limited to those situations with binary dependent variables (e.g., Yes/No, Purchase/Nonpurchase, etc.). The reader is encouraged to review logistic regression, because it presents many useful features in terms of interpretation of the impacts of the independent variables. Logistic regression has several advantages over discriminant analysis: * it is more robust: the independent variables don't have to be normally distributed, or have equal variance in each group * It does not assume a linear relationship between the IV and DV * It may handle nonlinear effects * You can add explicit interaction and power terms * The DV need not be normally distributed. * There is no homogeneity of variance assumption. * Normally distributed error terms are not assumed. * It does not require that the independents be interval. * It does not require that the independents be unbounded.

19

With all this flexibility, you might wonder why anyone would ever use discriminant analysis or any other method of analysis. Unfortunately, the advantages of logistic regression come at a cost: it requires much more data to achieve stable, meaningful results. With standard regression, and DA, typically 20 data points per predictor is considered the lower bound. For logistic regression, at least 50 data points per predictor is necessary to achieve stable results.

3.

HOW DOES DISCRIMINANT ANALYSES HANDLE THE RELATIONSHIP OF THE DEPENDENT AND INDEPENDENT VARIABLES?

Discriminant Analysis is the appropriate statistical technique when the dependent variable is categorical and the independent variables are quantitative. In many cases, the dependent variable consists of two groups or classifications, for example, male versus female, high versus low or good credit risk versus bad credit risk. In other instances, more than two groups are involved, such as a three group classification involving low, medium and high classifications. The basic purpose of discriminant analysis is to estimate the relationship between a single categorical dependent variable and a set of quantitative independent variables. Discriminant analysis has widespread application in situations where the primary objective is identifying the group to which an object (eg. person, firm or product) belongs. Potential applications include predicting the success or failure of a new product, deciding whether a student should be admitted to graduate school,classifying students as to vocational interests, determining what category of credit risk a person falls into or predicting whether a firm will be a success or not. Discriminant analysis is capable of handling either two groups or multiple groups. When three or more classifications are identified, the technique is referred to as multiple discriminant analysis (MDA). Discriminant analysis involves deriving a variate, the linear combination of the two (or more) independent variables that will discriminate best between defined groups. The linear combination for a discriminant analysis, also known as the discriminant function, is derived from an equation that takes the following form: Z = W1X1 + W2X2 + W3X3 + + WnXn where: Z Wi Xi = Discriminant score = Discriminant weight for variable i = Independent variable i

Discriminant analysis is the appropriate statistical technique for testing the hypotheses that the group means of a set of independent variables for two or more groups are equal. This group mean is referred to as a centroid. The centroids indicate the most typical location of any individual from a particular group, and a comparison of the group centroids shows how far apart the groups are along the dimension being tested.

20

A situation where there are three groups (1, 2 and 3) and two independent variables (X1, and X2) is plotted below.

The test for the statistical significance of the discriminant function is a generalized measure of the distance between the group centroids. If the overlap in the distribution is small, thediscriminant function seperates the groups well. If the overlap is large, the function is a poor discriminator between the groups. Multiple discriminant analysis is unique in one characteristic among the dependence relationships we will study. If there are more than two groups in the dependent variable, discriminant analysis will calculate more than one discriminant function. In fact, it will calculate NG-1 functions, where NG is the number of groups.

4.

WHAT ARE THE DIFFERENCES IN ESTIMATION AND INTERPRETATION BETWEEN LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS?

What is Logistic Regression? Logistic regression allows one to predict a discrete outcome such as group membership from a set of variables that may be continuous, discrete, dichotomous, or a mix. (Tabachnick and Fidell, 1996, p575)

What is Discriminant Analysis? The goal of the discriminant function analysis is to predict group membership from a set of predictors (Tabachnick and Fidell, 1996, p507)

21

When/How to use Logistic Regression and Discriminant Analysis? From the above definitions, it appears that the same research questions can be answered by both methods. The logistic regression may be better suitable for cases when the dependant variable is dichotomous such as Yes/No, Pass/Fail, Healthy/Ill, life/death, etc., while the independent variables can be nominal, ordinal, ratio or interval. The discriminant analysis might be better suited when the dependant variable has more than two groups/categories. However, the real difference in determining which one to use depends on the assumptions regarding the distribution and relationship among the independent variables and the distribution of the dependent variable.

So, what is the difference? Well, for both methods the categories in the outcome (i.e. the dependent variable) must be mutually exclusive. One of the ways to determine whether to use logistic regression or discriminant analysis in the cases where there are more than two groups in the dependant variable is to analyze the assumptions pertinent to both methods. The logistic regression is much more relaxed and flexible in its assumptions than the discriminant analysis. Unlike the discriminant analysis, the logistic regression does not have the requirements of the independent variables to be normally distributed, linearly related, nor equal variance within each group (Tabachnick and Fidell, 1996, p575). Being free from the assumption of the discriminant analysis, posits the logistic regression as a tool to be used in many situations. However, when [the] assumptions regarding the distribution of predictors are met, discriminant function analysis may be more powerful and efficient analytic strategy (Tabachnick and Fidell, 1996, p579). Even though the logistic regression does not have many assumptions, thus usable in more instances, it does require larger sample size, at least 50 cases per independent variable might be required for an accurate hypothesis testing, especially when the dependant variable has many groups (Grimm and Yarnold, p. 221). However, given the same sample size, if the assumptions of multivariate normality of the independent variables within each group of the dependant variable are met, and each category has the same variance and covariance for the predictors, the discriminant analysis might provide more accurate classification and hypothesis testing (Grimm and Yarnold, p.241). The rule of thumb though is to use logistic regression when the dependant variable is dichotomous and there are enough samples. [194:604].

5.

EXPLAIN THE CONCEPT OF ODDS AND WHY IT IS USED IN PREDICTING PROBABILITY IN A LOGISTIC REGRESSION PROCEDURE.

The odds ratio (OR) is a popular measure of the strength of association between exposureand disease. In a cohort study, the odds ratio is expressed as the ratio of the number of cases to the number of noncases in the exposed and unexposed groups. The odds ratio and its familiar computation are attributed to Corneld (1951), which is calculated as the ratio of the products of the pairs of diagonal elements in the 2 2 table:

22

OR = A D BC For illustration, data from Canterino et al. (1999) are used. This prospective cohort study investigated factors for periventricular leucomalacia and intraventricularhemorrhage in preterm neonates. Using the development of severe lesions as the disease outcome and the administration of antenatal steroids as exposure, an odds ratio iscalculated using the 2 2 table. Table 1 Severe Lesions + Steroids + Steroids OR 26 (A) 134 (C) = (26 584)= 0.356 (318 134) The interpretation of the odds ratio is that the odds for the development of severe lesions in infants exposed to antenatal steroids are 64% lower than those of infants not exposed to antenatal steroids. Point estimates for the odds ratio and condence interval are available from Statas cc or cs command. InStata 8, the default condence intervals are exact. However, for purposes of comparison with logistic regression, we use the woolf option, which estimates the condence interval using a Wald statistic. (The Wald statistic is a quadratic approximation of the log-likelihood curve and is most accurate in the region of the most common sample value. It is an approximation of, but less accurate than, the score statistic, which is also a quadratic approximation of the log likelihood and is most accurate in the region of the null value. Despite the fact that it is an approximation, the Wald statistic provides a simple method for estimating binomial distributions and, therefore, is widely used. Further details are found in Clayton and Hills (1993) and Rothman and Greenland (1998).) Severe Lesions 318 (B) 584 (D)

QUESTION 4

ANSWER THE FOLLOWING QUESTIONS ON STRUCTURAL MODEL. 1. WHAT IS A STRUCTURAL MODEL?

Structural equation modelling (SEM) is a technique that allows separate relationships for each of a setof dependent variables. In its simplest sense, structural equation modeling provides the appropriateand most efficient estimation technique for a series of separate multiple regression equations estimatedsimultaneously. It is characterized by two basic components: (1) the structural model and (2) themeasurement model.

23

SEM is a flexible approach to examining howthings are related to each other. So, SEM applications can appear quite different. However, three key characteristics ofSEM are (1) the estimation of multiple and interrelateddependence relationships, (2) an ability to represent unobservedconcepts in these relationships and correct for measurementerror in the estimation process, and (3) a focus onexplaining the covariance among the measured items.

2.

WHY DOES ONE USE SEM TO TEST A STRUCTURAL MODEL?

Unlike the other multivariate techniques discussed, structural equation modeling (SEM) examines multiple relationships between sets of variables simultaneously. This represents a family of techniques, including LISREL, latent variable analysis, and confirmatory factor analysis. SEM can incorporate latent variables, which either are not or cannot be measured directly into the analysis. For example, intelligence levels can only be inferred, with direct measurement of variables like test scores, level of education, grade point average, and other related measures. These tools are often used to evaluate many scaled attributes or to build summated scales. The measurement model enables the researcher to use several variables (indicators) for a singleindependent or dependent variable. For example, the dependent variable might be a concept represented by a summated scale, such as self-esteem. In a confirmatory : factor analysis the researcher can assess thecontribution of each scale item as well as incorporate how well the scale measures the concept (reliability).The scales are then integrated into the estimation of the relationships between dependent andindependent variables in the structural model. This procedure is similar to performing a :factor analysisof the scale items and using the :factor scores in the regression.

3.

WHEN DO YOU USE SEM TO TEST A STRUCTURAL THEORY?

The structural model is the path model, which relates independent to dependentvariables. In such situations, theory, prior experience, or other guidelines enable the researcher todistinguish which independent variables predict each dependent variable. Models discussed previouslythat accommodate multiple dependent variables-multivariate analysis of variance and canonicalcorrelation-are not applicable in this situation because they allow only a single relationshipbetween dependent and independent variables.

QUESTION 5 1. WHAT IS CONJOINT ANALYSIS?

Conjoint analysis is a multivariate technique developed specifically to understand how respondents develop preferences for any type of object (products, services, or ideas). It is based on the simple premise that consumers evaluate the value of an object (real or hypothetical) by combining the separate amounts of value provided by each attribute. Moreover, consumers can best provide their estimates of preference by judging objects formed by combinations of attributes.

24

Conjoint analysis is an emerging dependence technique that brings new sophistication to the evaluation of objects, such as new products, services, or ideas. The most direct application is in new product or service development, allowing for the evaluation of complex products while maintaining a realistic decision context for the respondent The market researcher is able to assess the importanceof attributes as well as the levels of each attribute while consumers evaluate only a few productprofiles, which are combinations of product levels. Conjoint analysis is often referred to as trade-off analysis, since it allows for the evaluation of objects and the various levels of the attributes to be examined. It is both a compositional technique and a dependence technique, in that a level of preference for a combination of attributes and levels is developed. A part-worth, or utility, is calculated for each level of each attribute, and combinations of attributes at specific levels are summed to develop the overall preference for the attribute at each level. Models can be built that identify the ideal levels and combinations of attributes for products and services.

2.

WHY AND WHEN DO WE USE CONJOINT ANALYSIS?

Assume a product concept has three attributes (price, quality, and color), each at three possiblelevels (e.g., red, yellow, and blue). Instead of having to evaluate all 27 (3 X 3 X 3) possiblecombinations, a subset (9 or more) can be evaluated for their attractiveness to consumers, and theresearcher knows not only how important each attribute is but also the importance of each level(e.g., the attractiveness of red versus yellow versus blue). Moreover, when the consumer evaluationsare completed, the results of conjoint analysis can also be used in product design simulators, whichshow customer acceptance for any number of product formulations and aid in the design of theoptimal product.

3.

HOW DO YOU USE CONJOINT ANALYSIS? USE THE FOLLOWING STAGES TO GUIDE YOU.
Stage 1: The Objectives Of Conjoint Analysis Stage 2: The Design Of Conjoint Analysis

The researcher applying conjoint analysis must make a number of key decisions in designing the experiment and analyzing its results. The discussion follows a six-step modelbuilding paradigm. The decision process is initiated with a specification of the objectives of conjoint analysis. Because conjoint analysis is similar to an experiment, the conceptualization of the research is critical to its success. After the defining the objectives, addressing the issues related to the actual research design, and evaluating the assumptions, the discussion looks at how the decision process then considers the actual estimation of the conjoint results, the interpretation of the results, and the methods used to validate the results. The discussion ends with an examination of the use of conjoint analysis results in further analyses, such as market segmentation and choice simulators. Each of these decisions stems from the research question and the use of conjoint analysis as a tool in understanding the respondent's preferences and judgment process. We follow this discussion of the model-building approach by examining two alternative conjoint methodologies (choice-based and adaptive conjoint), which are then compared to the issues addressed here for traditional conjoint analysis.

25

STAGE 1: THE OBJETIVES OF CONJOINT ANALYSIS As with any statistical analysis, the starting point is the research question. In understandingconsumer decisions, conjoint analysis meets two basic objectives: 1. To determine the contributions of predictor variables and their levels in the determination ofconsumer preferences. For example, how much does price contribute to the willingness to buya product? Which price level is the best? How much change in the willingness to buy soap canbe accounted for by differences between the levels of price? 2. To establish a valid model of consumer judgments. Valid models enable us to predict the consumeracceptance of any combination of attributes, even those not originally evaluated byconsumers. In doing so, the issues addressed include the following: Do the respondents'choices indicate a simple linear relationship between the predictor variables and choices? Is asimple model of adding up the value of each attribute sufficient, or do we need to includemore complex evaluations of preference to mirror the judgment process adequately? The respondent reacts only to what the researcher provides in terms of profiles (attribute combinations). Are these the actual attributes used in making a decision? Are other attributes, particularlyattributes of a more qualitative nature, such as emotional reactions, important as well? These andother considerations require the research question to be framed around two major issues: Is it possible to describe all the attributes that give utility or value to the product or service being studied? What are the key attributes involved in the choice process for this type of product or service? These questions need to be resolved before proceeding into the design phase of a conjoint analysisbecause they provide critical guidance for key decisions in each stage.

Defining the Total utility of the Object The researcher must first be sure to define the total utility of the object To represent the respondent'sjudgment process accurately, all attributes that potentially create or detract from the overall utility ofthe product or service should be included. It is essential that both positive and negative factors beconsidered. First, by focusing on only positive factors the research may seriously distort the respondents' judgments. A realistic choice task requires that the "good and bad" attributes be considered. Second, even if the researcher omits the negative factors, respondents can subconsciously employthem, either explicitly or through association with attributes that are included. In either instance, theresearcher has an incomplete view of the factors that influenced the choice process. Fortunately, the omission of a single factor typically can have only a minimal impact on theestimates for other factors when using an additive model [84], but the omission of a key attribute maystill seriously distort the representation of the preference structure and diminish predictive accuracy.

26

Specifying the Determinant Factors In addition, the researcher must be sure to include all determinant factors (drawn from the conceptof determinant attributes [5]). The goal is to include the factors that best differentiate between theobjects. Many attributes considered to be important may not differentiate in making choicesbecause they do not vary substantially between objects. For example, safety in automobiles is an important attribute, but it may not be determinant in most cases because all cars meet strict government standards and thus are considered safe, at least atan acceptable level. However, other features, such as gas mileage, performance, or price, areimportant and much more likely to be used to decide among different car models. The researchershould always strive to identify the key determinant variables because they are pivotal in the actualjudgment decision. Stage 1: The objectives of conjoint analysis Objectives of Conjoint Analysis Conjoint analysis is unique from other multivariate techniques in that: It is a form of decompositional model that has many elements of an experiment Consumers only provide overall preference rating for objects (stimuli) created by the researcher Stimuli are created by combining one level (value) from each factor (attribute) Each respondent evaluates enough stimuli so that conjoint results are estimated for eachindividual A "successful" conjoint analysis requires that the researcher: Accurately define all of the attributes (factors) that have a positive and negative impact onpreference Apply the appropriate model of how consumers combine the values of individual attributes intooverall evaluations of an object Conjoint analysis results can be used to: Provide estimates of the "utility" of each level within each attribute Define the total utility of any stimuli so that it can be compared to other stimuli to predict consumer choices (e.g., market share)

STAGE 2: THE DESIGN OF A CONJOINT ANALYSIS Having resolved the issues stemming from the research objectives, the researcher shifts attention tothe particular issues involved in designing and executing the conjoint analysis experiment. Asdescribed in the introductory section, four issues face a researcher in terms of research design: 1. Which of several alternative conjoint methods should be chosen? Conjoint analysis has three differing approaches to collecting and analyzing data, each with specific benefits and limitations. With the conjoint method selected, the next issue centers on the composition and design of theprofiles. What are the factors and levels to be used in defining utility? How are they to becombined into profiles?

2.

27

3.

A key benefit of conjoint analysis is its ability to represent many types of relationships in the conjoint variate. A crucial consideration is the type of effects that are to be included becausethey require modifications in the research design. Main effects, representing the direct impactof each attribute, can be augmented by interaction effects, which represent the unique impactof various combinations of attributes. The last issue relates to data collection, specifically the type of preference measure to be usedand the actual conjoint task faced by the respondent.

4.

Note that the design issues are perhaps the most important phase in conjoint analysis. A poorlydesigned study cannot be "fixed" after administration if design flaws are discovered. Thus, the researcher must pay particular attention to the issues surrounding construction and administration of the conjoint experiment. Designing the Conjoint Task Researchers must select one of three methodologies based on number of attributes, choice taskrequirements, and assumed consumer model of choice: Traditional methods are best suited when the number of attributes is less than 10,results are desired for each individual. and the simplest model of consumer choice is applicable Adaptive methods are best suited when larger numbers of attributes are involved (up to 30), butrequire computer-based interviews Choice-based methods are considered the most realistic, can model more complex models of consumer choice, and have become the most popular, but are generally limited to six or fewer attributes

The researcher faces a fundamental trade-off in the number of factors included: Increase them to better reflect the "utility" of the object Minimize them to reduce the complexity of the respondent's conjoint task and allow use of any of the three methods Specifying factors (attributes) and levels (values) of each factor must ensure that: Factors and levels are distinct influences on preference defined in objective terms with minimalambiguity, thereby generally eliminating emotional or aesthetic elements Factors generally have the same number of levels Interattribute correlations (e.g., acceleration and gas mileage) may be present at minimal levels(.20 or less) for realism, but higher levels must be accommodated by: Creating a superattribute (e.g., performance) Specifying prohibited pairs in the analysis to eliminate unrealistic stimuli (e.g., fast accelerationand outstanding gas mileage) Constraining the model estimation to conform to prespecified relationships Price requires special attention because: It generally has interattribute correlations with most other attributes (e.g., price-quality relationship) It uniquely represents in many situations what is traded off in cost for the object Substantial interactions with other variables may require choice-based conjoint or multistage conjoint methods

28

4.

WHAT ARE THE PRACTICAL LIMITS OF CONJOINT ANALYSIS IN TERMS OF VARIABLES OR TYPES OF VALUES FOR EACH VARIABLE? WHAT TYPE OF CHOICE PROBLEMS ARE BEST SUITED TO ANALYSIS WITH CONJOINT ANALYSIS? WHICH ARE LEAST WELL SERVED BY THE USE OF CONJOINT ANALYSIS?

The full-profile and trade-off methods are unmanageable with more than 10 attributes, yet many conjoint studies need to incorporate 20, 30, or even more attributes. In these cases, some adapted or reduced form of conjoint analysis is used to simplify the data collection effort and still represent a realistic choice decision. The two options, include (1) an adaptive/self-explicated conjoint for dealing with a large number of attributes and (2) a choice-based conjoint for providing more realistic choice tasks. In the self -explicated model, the respondent provides a rating of the desirability of each level of an attribute and then rates the relative importance of the attribute overall. With the adaptive/hybrid model, the selfexplicated and part-worth conjoint models are combined. The self-explicated values are used to create a small subset of profiles selected fum a fractional factorial design. The profiles are then evaluated in a manner similar to traditional conjoint analysis. The sets of profiles differ among respondents, and although each respondent evaluates only a small number, collectively all profiles are evaluated by a portion of the respondents. To make the conjoint task more realistic, an alternative conjoint methodology, known as choice-based conjoint can be used. It asks the respondent to choose a full profile fum a set of alternative profiles known as a choice set. This process is much more representative of the actual process of selecting a product from a set of competing products. Moreover, choice-based conjoint provides an option of not choosing any of the presented profiles by including a ''No Choice" option in the choice set. Although traditional conjoint assumes respondents' preferences will always be allocated among the set of profiles, the choice-based approach allows for market contraction if all the alternatives in a choice set are unattractive. To use conjoint analysis the researcher must assess many facets of the decision-making process. Our focus has been on providing a better understanding of the principles of conjoint analysis and how they represent the consumer's choice process. This understanding should enable researchers to avoid misapplication of this relatively new and powerful technique whenever faced with the need to understand choice judgments and preference structures However, conjoint analysis is limited in the number of variables it can include, so the researcher cannot just include additional questions to compensate for a lack of clear conceptualization of the problem.

5.

HOW WOULD YOU ADVISE A MARKET RESEARCHER TO CHOOSE AMONG THE THREE TYPES OF CONJOINT METHODOLOGIES? WHAT ARE THE MOST IMPORTANT ISSUES TO CONSIDER, ALONG WITH EACH METHODOLOGYS STRENGTHS AND WEAKNESSES?

Three methods of profile presentation are most widelyassociated with conjoint analysis. Although they differ markedly in the form and amount of informationpresented to the respondent, they all are acceptable within the traditional conjointmodel. The choice between presentation methods focuses on the assumptions as to the extentof consumer processing being performed during the conjoint task and the type of estimation processbeing employed.
29

Full-Profile Method.The most popular presentation method is the full-profile method principallybecause of its perceived realism as well as its ability to reduce the number of comparisonsthrough the use of fractional factorial designs. In this approach, each profile is described separately,most often using a profile card (see Figure 3b). This approach elicits fewer judgments, but each ismore complex, and the judgments can be either ranked or rated. Its advantages include a more realisticdescription achieved by defining a profile in terms of a level for each factor and a more explicitportrayal of the trade-offs among all factors and the existing environmental correlations among theattributes. It is also possible to use more types of preference judgments, such as intentions to buy,likelihood of trial, and chances of switching-all difficult to answer with the other methods.The full-profile method is not flawless and faces two major limitations based on the respondents'ability and capacity to make reasoned decisions. First, as the number of factors increases, sodoes the possibility of information overload. The respondent is tempted to simplify the process byfocusing on only a few factors, when in an actual situation all factors would be considered. Second,the order in which factors are listed on the profile card may have an impact on the evaluation. Thus,the researcher needs to rotate the factors across respondents when possible to minimize ordereffects. The full-profile method is recommended when the number of factors is 6 or fewer. When thenumber of factors ranges from 7 to 10, the trade-off approach becomes a possible option to the full-profile method. If the number of factors exceeds 10, then alternative methods (adaptive conjoint)are suggested [29]. The Pairwise Combination Presentation.The second presentation method, the pairwisecomparison method, involves a comparison of two profiles (see Figure 3c), with the respondentmost often using a rating scale to indicate strength of preference for one profile over the other [46]. The distinguishing characteristic of the pairwise comparison is that the profile typically does notcontain all the attributes, as does the full-profile method. Instead only a few attributes at a time areselected in constructing profiles in order to simplify the task if the number of attributes is large. Theresearcher must be careful to not take this characteristic to the extreme and portray profiles with toofew attributes to realistically portray the objects. The pairwise comparison method is also instrumental in many specialized conjoint designs,such as Adaptive Conjoint Analysis (ACA) [87], which is used in conjunction with a large numberof attributes. Trade-Off Presentation. The final method is the trade-off approach, which compares attributestwo at a time by ranking all combinations of levels. It has the advantages ofbeing simple for the respondent and easy to administer, and it avoids information overload by presentingonly two attributes at a time. It was the most widely used form of presentation in the earlyyears of conjoint analysis. Usage of this method has decreased dramatically in recent years,however, owing to several limitations. Most limiting is the sacrifice in realism by using only twofactors at a time, which also makes a large number of judgments necessary for even a small numberof levels. Respondents have a tendency to get confused or follow a routinized response patternbecause of fatigue. It is also limited to only nonmetric responses and cannot use fractional factorialdesigns to reduce the number of comparisons necessary. It is rarely used in conjoint studies except in specialized designs.

30

QUESTION 6

1. HOW WOULD YOU DIFFERENTIATE BETWEEN MULTIPLE DISCRIMINANT ANALYSIS, REGRESSION ANALYSIS, LOGISTIC REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE? 2. WHEN WOULD YOU EMPLOY LOGISTIC REGRESSION RATHER THAN DISCRIMINANT ANALYSIS? WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF THE DECISION? 3. HOW DOES DISCRIMINANT ANALYSES HANDLE THE RELATIONSHIP OF THE DEPENDENT AND INDEPENDENT VARIABLES? 4. WHAT ARE THE DIFFERENCES IN ESTIMATION AND INTERPRETATION BETWEEN LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS? 5. EXPLAIN THE CONCEPT OF ODDS AND WHY IT IS USED IN PREDICTING PROBABILITY IN A LOGISTIC REGRESSION PROCEDURE.

Note: Same question with number 3

31

You might also like