|
Psychological
Assessment: A Journal of Consulting and Clinical Psychology |
© 1989 by the American Psychological Association, Inc. |
|
March
1989 Vol. 1, No. 1, 18-22 |
For personal use only--not for distribution. |
University of Minnesota
University of Minnesota
University of Minnesota
ABSTRACT
A real-data simulation of computerized adaptive administration of the MMPI was
conducted with data obtained from two personnel-selection samples and two
clinical samples. A modification of the countdown method was tested to
determine the usefulness, in terms of item administration savings, of several
different test administration procedures. Substantial item administration
savings were achieved for all four samples, though the clinical samples
required administration of more items to achieve accurate classification and/or
full-scale scores than did the personnel-selection samples. The use of
normative item endorsement frequencies was found to be as effective as
sample-specific frequencies for the determination of item administration order.
The role of computerized adaptive testing in the future of personality
assessment is discussed.
Computers are performing an ever-increasing variety of tasks in psychological testing. In ability testing most efforts to date have been aimed toward developing models and software for test administration and scoring (for a review, see Weiss & Vale, 1987 ). These endeavors have mainly involved the development of computerized adaptive testing based on item response theory ( Weiss, 1985 : Weiss & Vale, 1987 ). In contrast, very few efforts have been made to develop computerized adaptive testing administration and scoring techniques in the domain of personality assessment. Instead, emphasis has been placed on the development of computer-based test interpretation systems (cf. Ben-Porath & Butcher, 1986 : Butcher, 1987a ).
This difference in emphasis can likely be attributed to two sources. First, test interpretation has historically received far greater attention in personality assessment than it has in ability testing. Second, most clinically used personality inventories, most notably the Minnesota Multiphasic Personality Inventory (MMPI), likely do not meet the requirements or assumptions necessary to employ administration techniques based on item response theory, which is currently the most widely used method in computerized adaptive testing.
Adaptive testing involves the administration of a psychological test so that the optimal amount of information is obtained at a minimal cost. Information, as used here, pertains to obtaining an answer to an assessment question. Optimality is achieved by administering no additional items after the assessment question has been answered. Minimal cost is achieved by administering only those items needed to achieve the fastest answer to the assessment question.
In this study we applied the countdown method in order to determine whether the goals of adaptive testing can be achieved with the MMPI. The countdown method was introduced by Butcher, Keller, and Bacon (1985) as a means for classifying individuals into one of two groups based on whether or not they exceed a cutoff criterion on a given scale. If, for example, a psychiatric screening scale contains 30 items and the cutoff for the classification of deviance on that scale is 20 items endorsed in the keyed direction, then as soon as the individual has answered 11 items in the nonkeyed direction he or she can no longer exceed the criterion and classification has been accomplished with perfect accuracy. Alternatively, as soon as the respondent has endorsed 20 items in the keyed direction, classification has also been achieved and item administration can be terminated. In both cases the goal of optimization is achieved: that is, terminating test administration as soon as the assessment question (classification, in this case) has been answered.
There are two main limitations of the countdown method just described. First, the goal of cost minimization is not fully addressed, and second, the method is limited only to classification purposes. To address these difficulties we introduced two modifications to the countdown method. First, we enhanced cost minimization by administering items in order of the frequency of their endorsement in normative and specific samples. For instance, by first administering those items that are less likely to be endorsed in the keyed direction, we could rule out MMPI scale elevations (exceeding a cutoff of T score > 69) more quickly.
We addressed the second problem, restriction of the countdown method to classification questions, in the following manner. In simply classifying deviance versus nondeviance, administration may be terminated as soon as deviance has been ruled out or as soon as it has been established. However, in many practical applications of tests such as the MMPI, users want information about the actual scores of individuals with elevated scales, not simply the knowledge that they scored "somewhere above 69." By modifying the countdown method so that only the first half of this termination rule is applied, a full-scale score can be obtained on all deviant scales. If items on a particular scale are no longer administered after it has been established that the score on that scale will not exceed a T score of 69, we will still obtain full scores on all elevated scales, yielding fully interpretable code types.
In the present study, we conducted real-data simulations with data obtained from two personnel-selection samples and two clinical samples. Subjects' responses to MMPI items administered in the conventional pencil-and-paper format were used to simulate responses to adaptive administration of the MMPI.
Subjects
All of the subjects in this study were males. The data were collected by the third author in connection with other studies ( Butcher, 1987b ; Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, in press ; Graham & Butcher, 1988 ; McKenna & Butcher, 1987 ). Real-data simulations were run with data from two personnel-selection samples and two clinical samples. The personnel samples consisted of candidates for pilot positions at two major airlines. The first sample included 470 subjects with ages ranging from 22 to 49 ( M = 29.8). The second sample contained 289 pilot candidates with ages ranging from 23 to 49 ( M = 30.9). One of the clinical samples consisted of psychiatric inpatients tested at four different settings in Minnesota and Ohio. This sample included 232 subjects with ages ranging from 16 to 61 ( M = 31.3). The second clinical sample contained 566 patients at a chemical dependency treatment facility in Minnesota, with ages ranging from 18 to 72 ( M = 39.5). The subjects whose responses were used to obtain normative item endorsement frequencies were 1,138 males who made up the new national normative sample for the recent revision of the MMPI, the MMPI-2 ( Butcher, et al., in press ). Their average age was 41.7 years with a range from 18 to 84.
Full Scores on Elevated Scales
The results of this study are summarized in Table 1 . The five rows under the Full Scores heading present the mean, standard deviation, and range of the number of items that were administered in order to derive a full-scale score on any of the 13 scales ( L, F, K , and the 10 clinical scales) on which a subject's T score was 70 or higher.
The procedure in row 1 was to administer the 30 K items first (in order to derive the K -correction for Scales 1, 4, 7, 8, and 9) and then to administer the least to most frequently endorsed 1 items (based on the normative sample) for each scale, until it was determined that the subject's T score would not exceed 69. Where subjects' T scores were 70 or higher, all items for those scales were administered to derive full-scale scores. The two clinical samples required considerably more items than the personnel-selection samples because they contained many more cases where full scales had to be administered.
In row 2, the same administration procedure was followed, except that in this case items were administered in descending order, from most to least frequently endorsed. This change resulted in a substantial increase in the number of items administered to the personnel-selection samples and a somewhat smaller increase in the number of items administered to the clinical samples. Thus, regardless of the sample tested, administering items in ascending order from least to most frequently endorsed was the most efficient procedure when full scores were derived on elevated scales.
Row 3 contains the results of applying the same procedure used in row 1 except that sample-specific (rather than normative sample) frequencies were used to determine the order of item administration. We did this by randomly dividing each sample into two subsamples, using each subsample to compute item endorsement frequencies and determine the order of item administration for the other subsample, and then averaging the results for the two subsamples. Our goal in this analysis was to determine whether the use of sample-specific frequencies would be more efficient in terms of number of items administered.
A comparison of rows 1 and 3 shows only a very slight reduction in number of items administered to the personnel-selection samples (an average of three and four items, respectively) and a slight increase in the number of items administered to the clinical samples (an average of one to two items, respectively). We thus concluded that there is no appreciable advantage to be gained by relying on sample-specific frequencies to determine the order of item administration, and the remainder of our administration procedures were all based on normative frequencies.
Rows 4 and 5 contain the results of applying a slight modification to the administration procedure just described. Rather than administering all 30 K items first, followed by each scale's items separately, we administered the 150 least (row 4) or 150 most (row 5) frequently endorsed items first, inserting a K item after every fifth item. This yielded a total of 180 items administered during the first step. The same procedure used in rows 1 and 2 was followed for the remaining items. Two considerations led to attempting this procedure. First, we suspected that administering all of the K items at once could induce a response set that would in some way distort a subject's score on the K scale compared to pencil-and-paper administration. Second, due to the effect of overlapping items, the least or most frequently endorsed items were not being administered first on some of the later scales, resulting in a slight departure from the minimization procedure. The decision to administer the 150 least or most frequently endorsed items was based on a number of trials in which different numbers of items were administered, and this number was found to be the most efficient in terms of items savings.
As seen in rows 4 and 5, the K -insertion method was superior to the original procedure (rows 1 and 2, respectively) in both the clinical and personnel-selection samples. Here too, administering items in ascending order, from the least to most frequently endorsed, was more efficient than the opposite procedure.
Of the five administration procedures that were tested, administering the 150 least frequently endorsed items first, with a K item administered after every fifth item, and then applying the regular countdown technique to the remaining items on Scales L, F , and the 10 clinical scales was the most efficient procedure for obtaining full scores on all elevated scales in all four samples. It resulted in average savings of 119, 117,51, and 71 items, respectively.
Score Classification
In some cases, where assessment is conducted for quick screening purposes, the primary question of interest is, On which (if any) scales does the subject have elevated scores? The test user is not interested in how elevated a scale score is, only in whether or not it is elevated above a critical level. To answer this question we need not administer all of the items on an elevated scale, only the ones necessary to tell us whether or not the score exceeds a set cutoff. Rows 6, 7, and 8 in Table 1 present the number of items that were administered to the four samples to obtain such classifications.
The savings incurred by using the classification technique over the full-score procedure were substantial only in the clinical samples, because almost all of the personnel-selection subjects' profiles were within normal limits. The more elevation in a subject's scales, the greater the item savings from a classification procedure as opposed to full-score administration. Using the two variations of the K -insertion procedure, we found that administering items in ascending order, from least to most frequently endorsed, was the most cost-effective approach in the personnel-selection samples. We had anticipated that administering items in descending order, from most to least frequently endorsed, might be more efficient for classification in the clinical samples. This was true to a small degree in the psychiatric sample, but not in the chemical dependency sample.
As was the case for the full-score procedure, the K-insertion technique with items administered in ascending order from least to most frequently endorsed was the most useful procedure for scale classification. 2 For the personnel-selection samples, the savings in time are similar to those reported above for the fullscore procedure. For the two clinical samples they are substantially greater.
In the item and time savings reported, the comparisons were made to the administration of only the 383 items needed to score the basic validity and clinical scales. If these comparisons were made to the administration of the full set of 566 items (as is most often the case in pencil-and-paper as well as current online administrations), savings would be considerably greater.
This study was designed to determine whether applying the principles of computerized adaptive testing is (a) viable and (b) worthwhile, when assessing personality with the MMPI. The answer to both questions is a qualified yes. We must, at this point, qualify our answer because the current study was based on a simulation of computerized adaptive testing.
One potential use for the modified countdown method might be as an alternative to the various short forms of the MMPI. As noted by Hoffman and Butcher (1975) , the chief limitation of currently available MMPI short forms is that even when shortform to standard-form scale correlations are high, the success in actually predicting standard form code types is quite low. Clavelle and Butcher (1977) developed an adaptive technique for developing MMPI short forms that enabled more accurate (than conventional short forms) prediction of code types with a very small number of items. However, their technique did not enable perfect classification and was restricted to only nine possible code types. The modified countdown method, while requiring more items than Clavelle and Butcher's (1977) technique, yields the exact standard form code type for elevated scales. Thus, it offers a procedure for achieving the aims of short forms with no loss of information relevant to profile interpretation. In this respect, it is important to note that item savings are not achieved at the expense of scale validity, which might be expected to drop as the result of relying on fewer items. This is because the items that are not administered are not needed to answer the assessment question, and their administration would in no way change the result.
A number of limitations in the present study need to be pointed out and considered. First and foremost is the assumption that in actual (as opposed to simulated) adaptive administration of the MMPI, subjects would respond the same way as they would in the pencil-and-paper mode. This assumption can be further broken down into two more basic assumptions: first, that subjects would respond the same to computerized administration as to pencil-and-paper administration, and second, that changing the order of item administration such that less frequently endorsed items are administered first would not influence subjects' responses. We will address each of these assumptions separately.
The question of whether computerized and conventional test administration yield comparable results has been addressed in several studies (e.g., Biskin & Kolotkin, 1977 ; Bresolin, 1984 ; Katz & Dalby, 1981 ; Lukin, Dowd, Plake, & Kraft, 1985 ; Lushene, O'Neil, & Dunn, 1984 ; Skinner & Allen, 1983 ). The results of these investigations have been mixed, with some authors reporting comparability ( Katz & Dalby, 1981 ; Lukin et al., 1985 ; Lushene et al., 1984 ; Skinner & Allen, 1983 ) and others finding differences ( Biskin & Kolotkin, 1977 ; Bresolin, 1984 ). In both cases where statistical differences were found, they did not appear to be of great psychological import (i.e., did not alter test interpretation). Thus we may at this point assume that administering the MMPI by computer will not have a substantial effect on subjects' responses.
Will changing the order of item administration and administering less frequently endorsed items first have an effect on subjects' responses? Future studies in which subjects will be administered the MMPI both adaptively and in the conventional mode will provide empirical answers to these questions. Such studies must be conducted before adaptive MMPI administration can be considered for clinical use. At this point we may note that the countdown method does not require that items necessarily be administered scale by scale, and in future studies items from different scales should be intermixed.
As noted throughout this article, the current study represents a first step. The next step will involve a direct comparison between adaptive and conventional administration of the MMPI to the same subjects. Subsequently we envision the development of this methodology so that users will be able to determine, based on their particular assessment needs, which scales to administer, which procedure (classification versus full-score) to use for each scale, and what cutoffs to use for each scale ( T score = 70 is, after all, rather arbitrary).
The development of flexible and effective item administration procedures
fits well with Fowler's (1988) recent observation of the
parallels between computerized personality assessment and the development of
expert-systems based on the methods of artificial intelligence. In this
respect, advances in adaptive personality testing are central to the
development of fully interactive expert-systems that will enable clinicians and
other test users to obtain the most efficient assessment and accurate
evaluation of individuals' personalities.
Ben-Porath, Y. S. & Butcher, J. N. (1986). Computers in
personality assessment: A brief past, an ebullient present, and an expanding
future. Computers in Human Behavior, 2, 167-182.
Ben-Porath, Y. S. & Butcher, J. N. (1986). Computers in personality
assessment: A brief past, an ebullient present, and an expanding future. Computers
in Human Behavior, 2, 167-182.
Biskin, B. H. & Kolotkin, R. L. (1977). Effect of
computerized administration on scores for the MMPI. Applied Psychological
Measurement, 1, 543-549.
Biskin, B. H. & Kolotkin, R. L. (1977). Effect of computerized
administration on scores for the MMPI. Applied Psychological Measurement, 1,
543-549.
Bresolin, M. (1984). A comparative study of computer
administration of the Minnesota Multiphasic Personality Inventory in an
inpatient psychiatric setting. (Unpublished doctoral dissertation, Loyola
University of Chicago)
Butcher, J. N. (1987a). Computerized psychological
assessment: A practitioner's guide. (New York: Basic Books)
Butcher, J. N. (1987b. March). Personality factors in pilot
screening. (Paper presented at the 22nd annual Symposium on Recent
Developments in the MMPI, Seattle, WA)
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A.
& Kaemmer, B. ((in press).). Manual far the restandardized Minnesota
Multiphasic Personality Inventory: MMPI-2. (Minneapolis: University of
Minnesota Press)
Butcher, J. N., Keller, L. S. & Bacon, S. F. (1985). Current
developments and future directions in computerized personality assessment. Journal
of Consulting and Clinical Psychology, 53, 803-815.
Butcher, J. N., Keller, L. S. & Bacon, S. F. (1985). Current developments
and future directions in computerized personality assessment. Journal of
Consulting and Clinical Psychology, 53, 803-815.
Clavelle, P. R. & Butcher, J. N. (1977). An adaptive
typological approach to psychiatric screening. Journal of Consulting and
Clinical Psychology, 5, 851-859.
Clavelle, P. R. & Butcher, J. N. (1977). An adaptive typological approach
to psychiatric screening. Journal of Consulting and Clinical Psychology, 5,
851-859.
Fowler, R. D. (1988, March). Future directions in personality
assessment. (Paper presented at the 23rd annual symposium on Recent
Developments in the MMPI, St. Petersburg, FL)
Graham, J. R. & Butcher, J. N. (1988, March). Differentiating
schizophrenics and major affective disorder inpatients with the revised form of
the MMPl. (Paper presented at the 23rd annual Symposium on Recent
Developments in the MMPl, St. Petersburg, FL)
Hoffman, N. G. & Butcher, J. N. (1975). Clinical
limitations of three Minnesota Multiphasic Personality Inventory short forms. Journal
of Consulting and Clinical Psychology, 43, 32-39.
Hoffman, N. G. & Butcher, J. N. (1975). Clinical limitations of three
Minnesota Multiphasic Personality Inventory short forms. Journal of
Consulting and Clinical Psychology, 43, 32-39.
Katz, L. & Dalby, J. T. (1981). Computer and manual
administration of the Eysenck Personality Inventory. Journal of Clinical
Psychology, 37, 586-588.
Katz, L. & Dalby, J. T. (1981). Computer and manual administration of the
Eysenck Personality Inventory. Journal of Clinical Psychology, 37, 586-588.
Lukin, M. E., Dowd, E. T., Plake, B. S. & Kraft, R. G.
(1985). Comparing computerized versus traditional psychological assessment. Computers
in Human Behavior, 1, 49-58.
Lukin, M. E., Dowd, E. T., Plake, B. S. & Kraft, R. G. (1985). Comparing
computerized versus traditional psychological assessment. Computers in Human
Behavior, 1, 49-58.
Lushene, R. E., O'Neil, H. F. & Dunn, T. (1984). Equivalent
validity of a completely computerized MMPl. Journal of Personality
Assessment, 38, 353-361.
Lushene, R. E., O'Neil, H. F. & Dunn, T. (1984). Equivalent validity of a
completely computerized MMPl. Journal of Personality Assessment, 38,
353-361.
McKenna, T. & Butcher, J. N. (1987, March). Continuity
of the MM PI with alcoholics. (Paper presented at the 22nd annual Symposium
on Recent Developments in the MMPI, Seattle, WA)
Skinner, H. A. & Alien, B. A. (1983). Does the computer
make a difference? Computerized vs. face to face vs. self-report assessment of
alcohol, drug, and tobacco use. Journal of Consulting and Clinical
Psychology, 51, 267-275.
Skinner, H. A. & Alien, B. A. (1983). Does the computer make a difference?
Computerized vs. face to face vs. self-report assessment of alcohol, drug, and
tobacco use. Journal of Consulting and Clinical Psychology, 51, 267-275.
Weiss, D. J. (1985). Adaptive testing by computer. Journal
of Consulting and Clinical Psychology, 53, 774-789.
Weiss, D. J. (1985). Adaptive testing by computer. Journal of Consulting and
Clinical Psychology, 53, 774-789.
Weiss, D. J. & Vale, C. D. (1987). Computerized adaptive
testing for measuring abilities and other psychological variables.(In J. N.
Butcher (Ed.), Computerized psychological assessment: A practitioner's guide
(pp. 325—343). New York: Basic Books.)
Throughout the Results and Discussion sections, endorsement always refers to endorsement in the keyed direction.
In fact, this procedure resulted in a slight
increase in the number of items administered to the psychiatric sample.
However, given the importance of spreading out the administration of K items,
rather than administering them first and all at once, we determined that this
procedure would likely be more useful for this sample as well. This
question will be studied empirically in future studies.
Data analyses were supported by a grant from the University of Minnesota's
Academic Computing Services and Systems to Wendy S. Slutske.
Portions of the findings reported in this article were presented at the 23rd
annual Symposium on Recent Developments in the MMPI, Saint Petersburg, Florida,
March 1988.
We wish to thank Laura S. Keller and Niels G. Waller for their helpful comments
on previous versions of this article.
Correspondence may be addressed to
Received:
Revised: September 27, 1988
Accepted: October 12, 1988
Table 1. Number of Items Administered Under Various Scoring Procedures