You are on page 1of 13

For Class Discussion on the 19 th August 2011 (RM Class) The Case of Emotional Intelligence. Cheri A.

Young Measuring a psychological construct like emotional intelligence is as much an art as it is a science. Because such psychological constructs are latent and not directly observable, issues of construct validity are paramount, but are, unfortunately, often glossed over in the methodology sections of research papers. In an effort to increase the validity of conclusions reached using paper-and-pencil measures of psychological constructs like emotional intelligence, this web page was constructed. This page covers the major validity issues involved in measuring psychological constructs, using examples from measuring emotional intelligence. The information gathered here will provide insight regarding the construct of emotional intelligence and how one would attempt to clarify its meaning and measure it (as well as any other psychological construct for that matter). Why emotional intelligence is important: Researchers investigated dimensions of emotional intelligence (EI) by measuring related concepts, such as social skills, interpersonal competence, psychological maturity and emotional awareness, long before the term "emotional intelligence" came into use. Grade school teachers have been teaching the rudiments of emotional intelligence since 1978, with the development of the Self Science Curriculum and the teaching of classes such as "social development," "social and emotional learning," and "personal intelligence," all aimed at "raise[ing] the level of social and emotional competence" (Goleman, 1995: 262). Social scientists are just beginning to uncover the relationship of EI to other phenomenon, e.g., leadership (Ashforth and Humphrey, 1995), group performance (Williams & Sternberg, 1988), individual performance, interpersonal/social exchange, managing change, and conducting performance evaluations (Goleman, 1995). According to Goleman (1995: 160), "Emotional intelligence, the skills that help people harmonize, should become increasingly valued as a workplace asset in the years to come." And Shoshona Zuboff, a psychologist at Harvard Business School, points out, "corporations have gone through a radical revolution within this century, and with this has come a corresponding transformation of the emotional landscape. There was a long period of managerial domination of the corporate hierarchy when the manipulative, jungle-fighter boss was rewarded. But that rigid hierarchy started breaking down in the 1980s under the twin pressures of globalization and information technology. The jungle fighter symbolizes where the corporation has been; the virtuoso in interpersonal skills is the corporate future" (Goleman, 1995: 149). If these predictions are true, then the interest in emotional intelligence, if there is such a thing, is sure to increase, and with this increase in interest comes a corresponding increase in trying to measure emotional intelligence. Two such measures purport to measure emotional intelligence. One test from USA Weekendand the other is from Utne Reader. However, neither of these tests provide any evidence of providing results that are reliable or valid. Definition and Dimensions of EI Recent discussions of EI proliferate across the American landscape -- from the cover of Time, to a best selling book by Daniel Goleman, to an episode of the Oprah Winfrey show. But EI is not some easily dismissed "neopsycho-babble." EI has its roots in the concept of "social intelligence," first identified by E.L. Thorndike in 1920. Psychologists have been uncovering other intelligences for some time now, and grouping them mainly into three clusters: abstract intelligence (the ability to understand and manipulate with verbal and mathematics symbols), concrete intelligence (the ability to understand and manipulate

with objects), and social intelligence (the ability to understand and relate to people) (Ruisel, 1992). Thorndike (1920: 228), defined social intelligence as "the ability to understand and manage men and women, boys and girls -- to act wisely in human relations." And Gardner (1983) includes inter- and intrapersonal intelligences in his theory of multiple intelligences. These two intelligences comprise social intelligence. He defines them as follows: Interpersonal intelligence is the ability to understand other people: what motivates them, how they work, how to work cooperatively with them. Successful salespeople, politicians, teachers, clinicians, and religious leaders are all likely to be individuals with high degrees of interpersonal intelligence. Intrapersonal intelligence ... is a correlative ability, turned inward. It is a capacity to form an accurate, veridical model of oneself and to be able to use that model to operate effectively in life. Emotional intelligence, on the other hand, "is a type of social intelligence that involves the ability to monitor one's own and others' emotions, to discriminate among them, and to use the information to guide one's thinking and actions" (Mayer & Salovey, 1993: 433). According to Salovey & Mayer (1990), the originators of the concept of emotional intelligence, EI subsumes Gardner's inter- and intrapersonal intelligences, and involves abilities that may be categorized into five domains: Self-awareness: Observing yourself and recognizing a feeling as it happens. Managing emotions: Handling feelings so that they are appropriate; realizing what is behind a feeling; finding ways to handle fears and anxieties, anger, and sadness. Motivating oneself: Channeling emotions in the service of a goal; emotional self control; delaying gratification and stifling impulses. Empathy: Sensitivity to others' feelings and concerns and taking their perspective; appreciating the differences in how people feel about things. Handling relationships: Managing emotions in others; social competence and social skills. Self-awareness (intrapersonal intelligence), empathy and handling relationships (interpersonal intelligence) are essentially dimensions of social intelligence. MEASUREMENT ISSUES Psychological constructs Emotional intelligence is a psychological construct, an abstract theoretical variable that is invented to explain some phenomenon which is of interest to scientists. Salovey and Mayer invented (made up) the idea of emotional intelligence to explain why some people seem to be more "emotionally competent" than other people. It may just be that they are better listeners and this explains the variability in people's "emotional competence." Or it may be that these people differ in emotional intelligence, and this is what explains the difference. Salovey and Mayer believed it was necessary to develop the construct of emotional intelligence in order to explain this difference in people. Examples of other psychological constructs, just to name a few, include organizational commitment, self esteem, job satisfaction, tolerance for ambiguity, optimism, and intention to turnover. Problems with Measurement So imagine for the moment that you are a social scientist and you want to measure emotional intelligence using a paper-and-pencil instrument, or in other words, a questionnaire (also referred to as a scale or

measure as well). A questionnaire can include more than one measure or scale (a measure of self-esteem and a measure of depression). Questionnaires are the most commonly used procedure of data acquisition in field research (Stone, 1978), and many researchers have questioned how good these questionnaires really are. Field research involves investigating something out in the "real world" rather than in a laboratory. Problems with the reliability and validity of some of these questionnaires has often led to difficulties in interpreting the results of field research (Cook, Hepworth, Wall & Warr, 1981; Schriesheim, Powers, Scandura, Gardiner & Lankau, 1993; Hinkin, 1995). Unfortunately, researchers begin using these measures or questionnaires before knowing if they are any good or not, and often make significant conclusions only to be contracted by other researchers later on who are able to measure the constructs more accurately and precisely (Hinkin, 1995). Thus, before you go ahead and add another lousy measure of a psychological construct to the already growing pile of them, take a few minutes now to learn about the process of creating valid and reliable instruments that measure psychological constructs. Validity and Reliability Developing a measure of a psychological construct is a difficult and extremely time-consuming process if it is to be done correctly (Schmitt & Klimoski, 1991). However, if you don't take the time to do it right, then any conclusions you reach using your questionnaire may be dubious. Many organizational researchers believe that the legitimacy of organizational research as a scientific endeavor is dependent upon the how well the measuring instruments measure the intended constructs (Schoenfeldt, 1984). The management field needs measures that provide results that are valid and reliable if the field is to advance (cf. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1985). The American Psychological Association (1985) states that measures of psychological constructs should demonstrate content validity, criterion-related validity and internal consistency, or reliability, which in turn provide evidence of construct validity. Reliability refers to the extent to which the question responses correlate with the overall score on the questionnaire. In other words, do all the questions "hang together," all attempting to measure the same thing, whatever that thing is? What that "thing" is involves the issue of validity. Validity is basically "the best available approximation to the truth or falsity of a given inference, proposition, or conclusion" (Trochim, 1991: 33). In this particular case where a measure is being constructed, validity refers to how well the questionnaire measures what it is supposed to be measuring. There are different types of validity, and each will be discussed below. What needs to be stressed at this point is that the key word here is demonstrating, not proving, validity of our questionnaires. We can never prove that are instruments measure what they are supposed to measure. There is no one person or statistical test that can prove or give approval of your measure. That's why it is suggested that one use the modifier "approximately" when referring to validity because "one can never know what is true. At best, one can know what has not yet been ruled out as false" (Cook & Campbell, 1979: 37). Only through time and lots of testing will the approximate "validity and reliability" of your measure be established. I use quotes around the words validity and reliability because the measure itself is not reliable and valid, only the conclusions reached using the measure are reliable and valid. Construct Validity Construct validity is concerned with the relationship of the measure to the underlying attributes it is attempting to assess. A law analogy sums it up nicely: construct validity refers to measuring the construct of interest, the whole construct, and nothing but the construct. The goal is to measure emotional intelligence, fully and exclusively. To what degree is your questionnaire measuring the theoretical construct of emotional intelligence (only and completely)? Answering this question will demonstrate the construct validity of your instrument. What might be happening instead of emotional intelligence being

measured is that the measure might be measuring something else, may be measuring only part of emotional intelligence and part of something else, or may be measuring only part of emotional intelligence and not the full construct. Construct validity is an overarching type of validity, and includes face, content, criterion-related, predictive and concurrent validity (described below) and convergent and discriminant validity. Convergent validity is demonstrated by the extent to which the measure correlates with other measures designed to assess similar constructs. Discriminant validity refers to the degree to which the scale does not correlate with other measures designed to assess dissimilar constructs. Basically, by providing evidence of all these variations of construct validity (content, criterion-related, convergent and discriminant), you are establishing that your scale measures what it was intended to measure. Construct validity is often examined using the multitrait-multimethod matrix developed by Campbell and Fiske (1959). See two other terrific web pages for a thorough description of this method: one by Trochim and one by Jabs. Criterion-related validity This refers to the relationship between your measure and other independent measures (Hinkin, 1995). It is the degree to which your measure uncovers relationships that are in keeping with the theory underlying the construct. Criterion-related validity is an indicator that reflects to what extent scores on our measure of emotional intelligence can be related to a criterion. A criterion is some behavior or cognitive skill of interest that we want to predict using our test scores of emotional intelligence. For instance, people scoring higher in emotional intelligence on our test we would predict would demonstrate more sensitivity to others' problems, would be able to control their impulses, and would be able to label their emotions more easily than someone who scores lower on our test of emotional intelligence. Evidence of criterionrelated validity would usually be demonstrated by the correlation between the test scores and the scores of a criterion performance. Criterion-related validity has two sub-components: predictive validity and concurrent validity (Cronbach & Meehl, 1955). Predictive validity refers to the correlation between the test scores and the scores of a criterion performance given at a later date. Concurrent validity refers to the correlation between the test scores and the scores of a criterion performance when both tests are given at the same time. An example will help clarify the two types of validity. Perhaps we want to predict the performance of front desk clerks at a hotel. This will be our criterion that we want to predict using some test. The test we will use in this case is a measure of emotional intelligence. The predictive validity of the emotional intelligence test can be estimated by correlating an employee's score on a test of emotional intelligence with his/her performance evaluation a year after taking the test. If there is a high positive correlation, then we can predict performance using the emotional intelligence measure and have demonstrated the predictive validity of the emotional intelligence measure. To demonstrate concurrent validity, we would have to correlate emotional intelligence test scores and criterion scores (current performance evaluations). If the correlation is large and positive, this would provide evidence of concurrent validity. Because the concurrent validity correlation coefficient tends to underestimate the corresponding predictive validity correlation coefficient, predictive validity tends to be preferred to concurrent validity.

CREATING A MEASURE OF EMOTIONAL INTELLIGENCE Now that we know what we are up against, let's begin developing a measure of emotional intelligence (or any other construct you wish to measure). The basic steps for developing measures, as suggested by Schwab (1980) are as follows: Step 1: Item Development The generation of individual items or questions. Step 2: Scale Development The manner in which items are combined to form scales. Step 3: Scale Evaluation The examination of the scale in light of reliability and validity issues. The following discussion will be presented in the order of steps suggested by Schwab (1980), with modifications and additions made as necessary. At each step, the issues relating to validity and reliability will be addressed. Step 1: Item Generation The first step in creating a measure of a psychological construct is creating test questions or items. For example, in the case of emotional intelligence, you may create a group of 20 questions, the answers to which would provide evidence of a person's emotional intelligence. But how do you know what to ask? And how many questions are needed? The answer is that you have to ask questions that sample the construct domain and you have to ask enough questions to adequately sample the domain to ensure that the entire domain has been covered, but not too many extraneous questions. According to Hinkin (1995: 969), the "measure must adequately capture the specific domain of interest yet contain no extraneous content." This has to do with content validity and there is no statistical or quantitative index of content validity. It is a matter of judgment and of collecting evidence to prove the content validity of the measure. However, first things first. You have to define the construct you are interested in measuring. It may be already defined by the existing literature or it may need to be defined based on a review of the literature. In the case of emotional intelligence, Salovey and Mayer have provided a theoretical universe of emotional intelligence. They suggest that emotional intelligence consists of 5 dimensions as noted above. One way of generating items for your measure would be to create questions that tap these five dimensions, utilizing the classification schema defined by them. This is called the deductive approach to item development (Hinkin, 1995). So, you say, now we're getting somewhere. All I have to do is write questions that get at all 5 dimensions of emotional intelligence. And if I can't do it alone, I can ask experts to help generate questions within the conceptual definition of emotional intelligence. But how does one know if Salovey and Mayer are right? How does one know that emotional intelligence is comprised of 5 dimensions and not 6 or 3? And how do you know if the dimensions they mentioned are right? Maybe emotional intelligence consists of five dimensions, but just not the dimensions as they defined them. If little literature or theory exists concerning a construct, then an inductive approach to item development must be undertaken (Hinkin, 1995). Basically the researcher is left to determine the domain or dimensions of the construct. The researcher can gather qualitative data, such as interviews, and categorize the content of the interviews in order to generate the dimensions of the construct. One method that of data gathering that is quite useful in developing a conceptual domain of a construct is concept mapping. Developed by William Trochim (1989), concept mapping is a "type of structured conceptualization" that allows a group of people to conceptualize, in the form of a "concept map"(a visual display), the domain of

a construct. The group of people can consist of just about anyone and is typically best when a "wide variety of relevant people" are included (Trochim, 1989: 2). In the case of emotional intelligence, in order to develop the domain of the construct, one might wish to gather a group of experts, such as psychologists, or human resources managers, or a group of employees. The groups are then asked to brainstorm about the construct. For emotional intelligence, the brainstorming focus statement may be something like: "Generate statements which describe the ways in which a person high in emotional intelligence is distinct from someone low in emotional intelligence" or "What is emotional intelligence?" The entire process of concept mapping is described in Trochim (1989). What concept mapping does, as well as what can be done with data collected via qualitative methods such as interviews, is factor analyze, or sort, the items into groups which then provide a foundation for defining a construct as multi-dimensional. If we were to gather a bunch of experts and conducted a concept mapping session, we would hope that their conceptualization of emotional intelligence would consist of the five dimensions suggested by Mayer and Salovey, thus lending support to Mayer & Salovey's theoretical dimensions. Regardless of whether a deductive or inductive approach to item generation is undertaken, the main issue is content validity, specifically domain sampling. In the case of a deductive procedure, item are generated theoretically from the literature. These items may be assessed by experts in the area as to the content validity of the items. In the case of emotional intelligence, we could develop items to cover the five dimensions. Then we could ask a group of psychologists to sort the items into six categories, the five dimensions plus an "other" category. Those items that were assigned to the proper category more than 80% or 85% would be retained for use in the questionnaire. The "other" category and those items not meeting the cutoff for the proper category would be discarded. This procedure is described as a best procedure in Hinkin (1995). Another way of tackling this would be, rather than giving the five dimensions to the experts, just ask them to sort the piles into as many categories as they see fit. The results can be analyzed in the same manner used in concept mapping. If the experts come up with 5 dimensions like those theorized, then the researcher can be more confident in those dimensions. Just because some people theorize what the domain of a construct is, there is no reason to rely on their theoretical conceptualization of the construct. By giving the experts the categories up front, you are in essence, assuming those categories, dimensions or conceptualization of the construct is correct and are limiting the experts within those boundaries. Allowing the experts to sort into as many categories as they see fit allows the data to speak for itself and if the categories coincide with the theorized categories, this is confirmatory evidence of the conceptualization of the domain. If an inductive approach was taken, the same process can be undertaken. Experts may be used to sort the data. If interviews were conducted, the raw, qualitative data may be sorted, from which items are generated for each category. Another way of sorting involves generating items from the raw data, using as much of the wording provided by the interviewees as possible, and then sorting the items. The raw data or items may be sorted by either telling the sorters the number of categories to sort into or by allowing the sorters to categorize into as many categories as they see fit (and each sorter may sort into a different number of categories!). Once again, by allowing the sorters to determine the number of categories, it allows the data to speak rather than forcing the data into some preconceived notion as to how many categories there should be. The main concern in generating items for a measure is with content validity -- that is, assessing the adequacy with which the measure assesses the domain of interest.

The content validity of a measure should be assessed as soon as the items have been developed. This way, if items need revision, this can be done before the researcher has large investments in the preparation and administration of the questionnaire (Schriesheim, et al., 1993). Step 2: Scale Development There are three stages within this step: design of the developmental study, scale construction, and reliability assessment (Hinkin, 1995). A. Developmental study At this stage in the process, the researcher has a potential set of items for the questionnaire measuring the intended construct. However, at this point, we don't know if the items measure the construct. We only know that they seem to break down (via the sorting) into categories that seem to reflect the underlying dimensions of the construct. Next, the researcher has to administer the items or questionnaire to see how well the items conform to the expected and theorized structure of the construct. There are five important issues in measurement that need to be addressed in the developmental study phase of scale development. The Sample Who the questionnaire or items are given to make a difference. The sample of individuals chosen should be selected to reflect or represent the population of individuals the researcher is intended to study in the future and make inferences about. Reverse-scored Items The use of negatively worded items (items that are worded so a positive response indicates a "lack" of the construct) are mainly used to eliminate or attenuate response pattern bias or response set. Response pattern bias is where the respondent simply goes down the page without really reading the questions thoroughly and circles all "4"s for a response to all the questions. With reverse-scored items, the thought is that the respondent will have to think about the response because the answer is "reversed." However, in recent years, reverse-scored items have come under attack because these items where found to reduce the validity of questionnaire responses (Schriesheim & Hill, 1981) and in fact may introduce systemmatic error to the scale (Jackson, Wall, Martin, & Davids, 1993). An in factor analysis (a sorting of the items into underlying categories or dimensions) of negatively worded and positively worded items, the negatively worded item loadings were lower than the positively worded items that loaded on the same factor (Hinkin, 1995). Alternatives to attenuate response pattern bias should be sought before automatically turning to reverse-scored items. Keeping the scales shorter rather than longer can help reduce response pattern bias. Number of Items The measure of a construct should include enough items to adequately sample the domain, but at the same time is as parsimonious as possible, in order to obtain content and construct validity (Cronbach and Meehl, 1955). The number of items in a scale can affect responses in different ways. Scales with too many items and excessively lengthy can induce fatigue and response pattern bias (Anastasi, 1976). By keeping the number of items to a minimum, response pattern bias can be reduced (Schmitt & Stults, 1985). However, if too few items are used, than the content and construct validity and reliability of the

measure may be at risk (Kenny, 1979; Nunnally, 1976). Single item scales (those scales that ask just one question to measure a construct) are most susceptible to these problems (Hinkin & Schriesheim, 1989). Adequate internal consistency reliability can be obtain with as few as three items (Cook, Hepworth, Wall, & Warr, 1981), and the more items added the progressively less impact they have on the scale reliability (Carmines & Zeller, 1979). Scaling of Items The scaling of items refers to the choice of responses given for each item. Examples include Likert-type scales, such as choosing from 1 to 5, which refer to strongly agree, agree, neither agree or disagree, disagree, and strongly disagree, respectively. Semantic differential scales refer to the use of words such as "happy" and "sad" and the respondent chooses a response on a scale of 1 to 7 or 1 to 5, with "1" referring to "happy" and "5" or "7" referring to "sad" and the numbers in between referring to states between being happy and sad. The important issue to contend with at this point is achieving sufficient variance or variability among respondents. A researchers would not want a measure with a Likert-type scale with responses 1 to 3, and most of the respondents choosing response "3." This measure is not capable of differentiating different types of responses, and perhaps giving choices from 1 to 5 would alleviate this problem. The reliability of Likert-type scales increases with the increase in the number of response choices up to five, but then levels off (Lissitz & Green, 1975). Sample Size In terms of confidence in the results, the larger the sample size the better. That is, if the researcher has generated items and is looking to conduct a developmental study to check the validity and reliability of the items, then the larger sample of individuals administered the items, the better. The larger the sample, the more likely the results will be statistically significant. When conducting factor analysis of the items to check the underlying structure of the construct, the results may be susceptible to sample size effects (Hinkin, 1995). Rummel (1970) recommends an item-to-response ratio range of 1:4, and Schwab (1980) recommends a ratio of 1:10. For example, if a researchers has 20 items he/she is analyzing, then the sample size should be anywhere from 80 to 200 respondents. New research in this area has found that a sample size of 150 respondents should be adequate to obtain an accurate exploratory factor analysis solution given that the internal consistency reliability is reasonably strong (Guadagnoli & Velicer, 1988). An exploratory factor analysis is when there is no a priori conceptualization of the construct. A confirmatory factor analysis is when the researcher is attempting to confirm the theoretical conceptualization put forth in the literature. In the case of emotional intelligence, a confirmatory factor analysis would be conducted to see if the items "breakdown" or "sort" into five factors or "dimensions" similar to those suggested by Mayer and Salovey. Recent research suggests that a minimum sample size of 200 is necessary for an accurate confirmatory factor solution (Hoelter, 1983). B. Scale construction At this point in the process, the researcher has generated items and administered them to a sample (hopefully representative of the population of interest). The researcher has taken into consideration reverse-scored items, the number of items to both adequately sample the domain and be parsimonious, the scaling of the items to ensure sufficient variance among the respondents, and has used an adequate sample size. Now comes the process of constructing the scale or measure of the construct, through a process of

reduction of the number of items and the refinement of the construct. The most common technique for doing this is factor analysis (Ford, MacCallum & Tait, 1986). When items do not load sufficiently on a factor should be discarded or revised. Minimum item loadings of .40 are the most commonly mentioned criteria (Hinkin, 1995). The purpose of the factor analysis in the construction of the scale is to "examine the stability of the factor structure and provide information that will facilitate the refinement of a new measure" (Hinkin, 1995: 977). The researcher is trying to establish the factor structure or dimensionality of the construct. Using a couple of different independent samples for administering the items and then factor analyzing the results of each sample will help provide evidence (or lack of evidence!) of a stable factor structure. If the researcher finds a different factor structure for each sample, then the researcher has some work to do uncover a stable (the same for all samples) factor structure. Although either an exploratory or confirmatory factor analysis can be conducted, Hinkin (1995: 977) recommends using a confirmatory approach at this point in scale development "...because of the objective of the task of scale development, it is recommended that a confirmatory approach be utilized ... [because] it allows the researcher more precision in evaluating the measurement model." And although the confirmatory factor analysis will tell the researcher if the items are loading on the same factor, it does not tell the researcher if the factor is measuring the intended construct. For example, in the case of emotional intelligence, if I administered the items to a sample and the items loaded on five factors, I might want to jump to conclusions and say my items measure the same five dimensions as outlined by Mayer and Salovey. This would be a big mistake. All I really know at this point is that the items appear to measure five factors or dimensions of "something." I still don't know what that something is. I'm hoping that it is emotional intelligence, but I won't gather evidence until Step 3: Scale Evaluation (see below). C. Reliability assessment Two basic issues are to be dealt with at this point: internal consistency and the stability of the scale over time. As mentioned previously, the internal consistency reliability measures whether or not the items "hang together" -- that is, whether the items all measure the same phenomenon. The internal consistency reliability of measures are commonly assessed using Cronbach's Alpha. The stability of the measure over time will be assessed by the test-retest reliability of the measure since emotional intelligence is not expected to change over time (Stone, 1978). An alpha of .70 will be considered the minimum acceptable level for this measure. Step 3: Scale Evaluation At this point in the process, a measure of a psychological construct has been developed that is both reliable and valid. Construct validity was demonstrated via concept mapping, factor analysis, internal consistency, and test-retest reliability. However, as suggested by Hinkin (1995: 979, 980), Demonstrating the existence of a nomological network of relationships with other variables through criterion-related validity, assessing two groups who would be expected to differ on the measure, and the demonstrating discriminant and convergent validity using a method such as the multitrait-multimethod matrix developed by Campbell and Fiske (1959) would provide further evidence of the construct validity of the new measure.

Criterion-related Validity

Criterion-related validity is an indicator that reflects to what extent scores on the measure of the construct of interest can be related to a criterion. A criterion is some behavior or cognitive skill of interest that one wants to predict using the test scores of the construct of interest. For instance, in the case of emotional intelligence, people who score higher in emotional intelligence according to the measure would be predicted to demonstrate more sensitivity to others' problems, be able to control their impulses, and be able to label their emotions more easily than someone who scores lower on the test of emotional intelligence. Evidence of criterion-related validity would usually be demonstrated by the correlation between the test scores and the scores of a criterion performance. For emotional intelligence, the criterion performance could be showing sensitivity to others' problems, being able to label one's feelings, etc. judged by an expert. One way of doing this would be to have the facilitators of a sensitivity training group (T-group) judge a sample of T-group participants on the performance of the criteria. "The training or Tgroup is an approach to humans relation training which, broadly speaking, provides participants with an opportunity to learn more about themselves and their impact on others and, in particular, to learn how to function more effectively in face-to-face situations" (Cooper & Mangham, 1971: v). As such, it is a rich environment for seeing the display of emotional intelligence. The facilitators of each T-group will supply subjective measures of each group member's level of emotional intelligence and these will be correlated with the observed scores of each group member on the emotional intelligence instrument, providing further evidence for the measure's validity. Construct Validity Construct validity includes face, content, criterion-related, predictive, concurrent, convergent and discriminant validity, as well as internal consistency. Issues concerning face, content, predictive and concurrent validity have already been addressed in previous sections. As mentioned previously, construct validity is often examined using the multitrait-multimethod matrix, and is a wonderful method that addresses issues of convergent and discriminant validity (see Campbell and Fiske (1959) or the web pages by Trochim and Jabs for details on this method). Convergent validity is demonstrated by the extent to which the measure correlates with other measures designed to assess similar constructs. Discriminant validity refers to the degree to which the scale does not correlate with other measures designed to assess dissimilar constructs. In the case of emotional intelligence, the newly developed measure could be correlated with Gist's (1995) Social Intelligence measure, Riggio's (1986) Social Skills Inventory, Hogan's (1969) Empathy Scale, Snyder's (1986) Self-monitoring Scale, Eysenck's (1977) I.7 Impulsiveness Questionnaire and Watson and Greer's (1983) Courtauld Emotional Control Scale. Such correlations with specific dimensions of the emotional intelligence measure would provide evidence for convergent validity. Specifically,

Hogan's Empathy Scale should converge with the empathy subscale of the emotional intelligence instrument; Eysenck's I.7 Impulsiveness Questionnaire should negatively correlate and Watson and Greer's Courtauld Emotional Control Scale should positively correlate with the motivating oneself subscale of the emotional intelligence instrument; Riggio's Social Skills Inventory should converge with the handling relationships subscale of the emotional intelligence instrument; and Gist's Social Intelligence should be positively correlate with the self awareness and handling relationships subscales of the emotional intelligence instrument.

The correlations of these other scales with specific subscales of the measure of emotional intelligence would be predicted to be stronger than the correlations of any of these other scales with the entire measure of emotional intelligence, thus providing evidence of discriminant validity. In addition, discriminant validity of any measure of emotional intelligence would have to address how emotional intelligence differs from other intelligences. In addition, as with any measure of a psychological construct, social desirability should be assessed. One of the most popular measures of social desirability is the Crowne and Marlowe (1964) measure. Another point to be mentioned is that a different independent sample should be used at each stage in the development of any psychological construct, thus attenuating the possibility of "sample specific" findings and increasing the generalizability of the measure.

REFERENCES Anastasi, A. (1976). Psychological testing, 4th ed. New York: Macmillan. Ashforth, B.E. & Humphrey, R.H. (1995). Emotion in the workplace: A reappraisal. Human Relations, 48(2), 97-125. Campbell, D.T. & Fiske, D.W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix. Psychological Bulletin, 56: 81-105. Carmines, E.G. & Zeller, R.A. (1979). Reliability and validity assessment. Beverly Hills: Sage. Cook, T.D. & Campbell, D.T. (1979). Quasi-experimentation. Boston: Houghton Mifflin Company. Cook, J.D., Hepworth, S.J., Wall, T.D. & Warr, P.B. (1981). The experience of work. San Diego: Academic Press. Cooper, C.L. & Mangham, I.L. (1971). T-groups: A Survey of Research. London: Wiley-Interscience. Cronbach, L.J. & Meehl, P.C. (1955). Construct validity in psychological tests. Psychological Bulletin, 52: 281-302. Crowne, D. & Marlowe, D. (1964). The approval motive: Studies in evaluative dependence. New York: Wiley. Eysenck, S.B., Pearson, P.R., Easting, G. & Allsopp, J.F. (1985). Age norms for impulsiveness, venturesomeness and empathy in adults. Personality and Individual Differences, 6(5), 613-619. Ford, J.K., MacCallum, R.C. & Tait, M. (1986). The application of exploratory factor analysis in applied psychology: A critical review and analysis. Personnel Psychology, 39: 291-314. Gardner, H. (1993). Multiple Intelligences. New York: BasicBooks. Gist, M.E. (1995). The Social Intelligence measure.

Goleman, D. (1995). Emotional intelligence. New York: Bantam Books. Guadagnoli, E. & Velicer, W.F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin, 103: 265-275. Hinkin, T.R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21(5), 967-988. Hinkin, T.R. & Schriesheim, C.A. (1989). Development and application of new scales to measure the French and Raven (1959) bases of social power. Journal of Applied Psychology, 74(4): 561-567. Hoelter, J.W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods and Research, 11: 325-344. Hogan, R. (1969). Development of an empathy scale. Journal of Consulting and Clinical Psychology, 33, 307-316. Jackson, P.R., Wall, T.D., Martin, R. & Davids, K. (1993). New measures of job control, cognitive demand and production responsibility. Journal of Applied Psychology, 78: 753-762. Kenny, D.A. (1979). Correlations and causality. New York: Wiley. Lissitz, R.W. & Green, S.B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60: 10-13. Mayer, J.D. & Salovey, P. (1993). The intelligence of emotional intelligence. Intelligence, 17, 433-442. Nunnally, J.C. (1976). Psychometric theory, 2nd ed. New York: McGraw-Hill. Riggio, R. (1986). Assessment of basic social skills. Journal of Personality and Social Psychology, 51(3), 649-660. Ruisel, I. (1992). Social intelligence: Conception and methodological problems. Studia Psychologica, 34(4-5), 281-296. Rummel, R.J. (1970). Applied factor analysis. Evanston, IL: Northwestern University Press. Salovey, P. & Mayer, J.D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9(1990), 185-211. Schmitt, N.W. & Klimoski, R.J. (1991). Research methods in human resources management. Cincinnati: South-Western Publishing. Schmitt, N.W. & Stults, D.M. (1985). Factors defined by negatively keyed items: The results of careless respondents? Applied Psychological Measurement, 9: 367-373. Schoenfeldt, L.F. (1984). Psychometric properties of organizational research instruments. In T.S. Bateman & G.R. Ferris (Eds.), Method and analysis in organizational research. Reston, VA: Reston Publishing.

Schriesheim, C.A. & Hill, K. (1981). Controlling acquiescence response bias by item reversal: The effect on questionnaire validity. Educational and psychological measurement, 41: 1101-1114. Schriesheim, C.A., Powers, K.J., Scandura, T.A., Gardiner, C.C. & Lankau, M.J. (1993). Improving construct measurement in management research: Comments and a quantitative approach for assessing the theoretical content adequacy of paper-and-pencil survey-type instruments. Journal of Management, 19: 385-417. Schwab, D.P. (1980). Construct validity in organization behavior. In B.M. Staw & L.L. Cummings (Eds.), Research in organizational behavior, Vol. 2. Greenwich, CT: JAI Press. Snyder, M. (1986). On the nature of self-monitoring: Matters of assessment, matters of validity. Journal of Personality and Social Psychology, 51(1), 125-139. Stone, E. (1978). Research methods in organizational behavior. Glenview, IL: Scott, Foresman. Thorndike, E.L. (1920). Intelligence and its uses. Harper's Magazine, 140, 227-235. Trochim, W.M. (1991). Developing an evaluation culture for international agricultural research. In D.R. Lee, S. Kearl, and N. Uphoff (Eds.). Assessing the Impact of International Agricultural Research for Sustainable Development: Preceedings from a Symposium at Cornell University, Ithaca, NY, June 16-19, the Cornell Institute for Food, Agriculture and Development, Ithaca, NY. Trochim, W.M. (1989). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12, 1-16. Trochim, W.M. (1985). Pattern matching, validity, and conceptualization in program evaluation. Evaluation Review, 9(5), 575-604. Watson, M. & Greer, S. (1983). Development of a questionnaire measure of emotional control. Journal of Psychosomatic Research, 27(4), 299-305. Williams, W.M. & Sternberg, R.J. (1988). Group intelligence: Why some groups are better than others. Intelligence, 12, 351-377.

You might also like