| Psychological Assessment | © 2000 by the American Psychological Association |
September 2000 Vol. 12, No. 3, 263-271 | For personal use only--not for distribution. |
Some types of psychological tests become dated and require more frequent and more extensive revision than others. Because of the formidable effort that is required in a test revision, the goals and scope of the revision need to be carefully staked out before a revision is undertaken. The revision team needs to develop a generally agreed-upon guiding philosophy for the test revision in the beginning of the project and incorporate broad input into the changes that are likely to be required. Factors important to consider in a test revision are discussed, and the parameters of personality test revision illustrated from the extensive program to revise the Minnesota Multiphasic Personality Inventory (MMPI) are included. Recommendations for gauging acceptance of the revision are suggested along with steps that revisers and publishers might take to make a test revision both more research based and more acceptable to test users.
When is a psychological test in need of revision? Not everything in life becomes functionally ineffective at the same rate. Some types of psychological tests become dated and require more frequent or more extensive revision over time than others. Achievement tests, cognitive measures, or interest scales that rely upon current information or recent events need revision more frequently than measures such as personality tests that rely upon stimulus material that remains more constant over time. Some measures, such as the Rorschach Inkblot Technique, which is made up of a series of ambiguous inkblots, are not greatly influenced by changing meaning and may never need revision at the stimulus level. Revisions at the level of interpretation, however, may be required as information on the interpretation of the technique increases. Other projective techniques, such as the Thematic Apperception Test (TAT), which includes a number of pictures involving people engaging in various activities, however, are influenced by changing times and styles, and the stimulus cards (still in use in their original form) may appear to people as dated.
Test revisions can come in all sizes and shapes. One might simply make some minor changes in the booklet and test manual, leave all else the same, and call it a revision or go a bit further and drop some nonworking items, add new ones, or even develop new norms. However, most widely used measures require more extensive alterations to keep the instrument viable. Making even small item or norm changes requires that many, more substantial tasks be undertaken.
The more successful a test the more likely it becomes that inertia and change resistance may develop over time. Thus, broad usage of a test may extend its use beyond the point that problems of obsolescence begin to emerge. If a test is in wide use, there are likely to be more arguments against its revision and more resistance to change even though everything else around it has changed, resulting in an instrument that becomes even more out of date. Such was the case with the Minnesota Multiphasic Personality Inventory (MMPI)because it was so widely used and seemed to work well enough (or else people tended to overlook its problems), a revision was long overdue by the time the moon and stars and the test publisher were in the proper phase that a revision could be mounted.
The goals and scope of possible modifications need to be carefully evaluated before a revision of a widely used test is initiated. Revisions of major psychological instruments are typically more substantial undertakings than revisers initially conceive them to be. In an insightful discussion of his revision of the Strong Vocational Interest Blank (SVIB),
Campbell (1969
,
1972)
, more than a decade before the MMPI revision began, warned potential revisers of the MMPI at a national symposium devoted to the topic of an MMPI revision:
I emphasize that it was about twenty years ago that Strong first started thinking about these revisions and ten years ago that the work actually started. Those have to be sobering figures to anyone thinking about beginning to revise the MMPI. (1972, p. 118)
Campbell's sage advice to potential revisers of the MMPI back in the 1970s is clearly appropriate today for anyone revising a major psychological test. Many of his experiences in revising the SVIB occurred in the MMPI revision project as well. The issues surrounding the MMPI underwent considerable scrutiny ( Dahlstrom, 1972 ; Hathaway, 1972 ; Meehl, 1972 ) and debate ( Loevinger, 1972 ; Norman, 1972 ) before a revision was finally undertaken. Debate continued to take place a decade after the opening bell calling for a revision was sounded and another decade passed before work began. The guiding philosophy for the MMPI revision was widely discussed, and several publications heralded the project startup in the 1980s ( Butcher, 1972 ; Butcher & Owen, 1978 ), with self-study continuing over the ten years that the MMPI project took to complete, culminating in the publication of the MMPI-2 for adults ( Butcher, Dalstrom, Graham, Tellegen, & Kaemmer ) and MMPI-A for adolescents ( Butcher et al., 1992 ). Thus, as Campbell had envisioned, the MMPI-2 was published a full 20 years after the initial discussions aimed at developing a revision program for the test had begun.
Experiences from the revisions of the SVIB and MMPI suggest that it takes a great deal of time and research effort to effect a successful revision and gain broad acceptance by the professional community. The present article was written to provide a practical framework for conducting a test revision of a major personality test and to examine ramifications that alterations might have on test usage. Examples are drawn largely from the extensive MMPI revision program. Anyone seriously contemplating a revision of a major instrument should also read David Campbell's (1972) discussion of the SVIB revision.
A generally, agreed-upon guiding philosophy for the test revision should be developed in the beginning of the project. Developing a plan that stakes out the major outlines of the alterations that are to be made and makes professional use less adversely impacted is the best tack to take at startup. The major issues a test reviser will face are summarized as Principles of Test Revision and are shown in Table 1 .
The revision planners did not view the MMPI revision as being designed to accomplish a magical metamorphosis into an entirely new test but rather to make an on-course correction along a clearly predetermined path. One element of the guiding philosophy for the MMPI revision that served as a framework was that the original instrument was considered to have many elements that should be maintained in the revised form in order for the test to be considered a revision and not an altogether new test:
The traditional clinical scales needed to be nearly identical in terms of item structure and general configuration as the original instrument.
Although many elements in the test needed to be identical in the revision, some changes to the booklet were necessary: Unused and dated items were deleted, some of the items were rewritten, some objectionable items were dropped, and new content was added in order to measure contemporary clinical problems and make the MMPI a more effective instrument in the future.
A more relevant, contemporary normative population study needed to be conducted to provide new norms for the traditional validity and clinical scales, even though the items were continuous with the original version.
The inclusion of new item content to replace out-of-date items would allow for the development of new scales to address contemporary problems.
The committee agreed that a substantial number of clinical studies needed to be developed in order to provide for well-defined samples to "test out" new scales that were to be developed or to provide a new validation of the traditional scales.
Early in the MMPI revision it was concluded that the use of the MMPI with adolescents required a separate form (with expansion of the item pool to include more adolescent problems) and the development of separate adolescent norms. The adolescent MMPI development program became a separate program of research, with a shorter but more relevant content domain than the adult version. This plan required the development of a separate set of norms and extensive clinical data collection before the test could be published.
In sum, the research philosophy guiding the MMPI Revision Committee was to make the necessary changes to improve the item content coverage, to change the traditional clinical scales as little as possible, to develop new contemporary measures to broaden the test's assessment base, and to employ new normative and new clinical data to test empirically any changes that were eventually undertaken. The guiding philosophy for the MMPI revision included the collection of a substantial database to allow for validation of the revised instrument.
The Scope of the Test RevisionHow much change is required in a test revision? This is a complicated question and would, of course, differ depending upon the type of test, the societal changes that have occurred, and new improvements in the science and technology of test construction that are availablefor example, new psychometric procedures or strategies that have evolved since the test was developed or last revised. The amount of revision needs to be determined in terms of the following possible parameters to change.
In test stimuli.The revision might call for the incorporation of new stimulus material or new tasks into the instrument. Passing years tend to make some test stimuli or interpretive procedures appear quaint, antiquated, or completely irrelevant. In revising the SVIB, Campbell (1972) found, for example, that the instrument was so dated that it contained items that referred to magazines that were no longer published.
In administrative procedures.New or alternative methods of administering the test might be desirable to consider (e.g., computer-administered item presentation using adaptive testing technology or computer-based voice activated response recording technology as it becomes available).
In scales or units of measure.New personality constructs or new ways of assessing important dimensions may be available since the earlier publication.
In norms or psychometric features.Newer psychometric approaches, for example Item Response Theory ( Embretson, 1996 ), might be employed to enhance or replace traditional psychometric procedures.
In applications.For example, instruments that were developed for one type of application might require substantial revision and altered norming if the test is to be expanded into other applications.
Each of the elements, as noted above, would vary depending upon the type of assessment measure involved and the length of time since the test was revised or was originally published.
In formulating the plan to revise the MMPI, the Revision Committee set out to to maintain continuity with the original MMPI while at the same time implementing important changes and expanding the items and scales in the test. The major reason for maintaining the continuity between MMPI-2 and the original MMPI was to allow clinicians and researchers alike to be able to use the revised MMPI as the original was used, in an uninterrupted manner. Those with longitudinal research, for example, could still understand the meaning of scale scores for persons who were followed up over time with a different version of the test. Yet, it was also considered important to incorporate new item content to address contemporary problems.
Resolve Key Prerevision Issues
It is critical to settle issues such as arrangement of credit, work responsibility, and royalties before the project gets underway. Issues of authorship and financial arrangements, if not mutually agreed upon prior to the revision (between the test publisher and revision team), can create ill feelings. Because most psychological tests are developed and marketed by commercial or academic publishers, revision arrangements should be spelled out in a contract, particularly if the revisers of the test are different from the initial developers of the instrument.
David Campbell (1972)
pointed out clear obstacles to a test revision:
The third major obstacle for the MMPI reviser concerns the necessary practical arrangements. These can be subdivided into three main areas: first, the establishment of some administrative structure so that the work will get done. Essentially this means deciding who is going to be responsible and then giving him enough authority to carry out that responsibility; second, the provision of the necessary funds to support the activity; and third, the assignment of credit for doing the work to include both authorship listing and royalties. (p. 124)
Some aspects of the MMPI revision may not provide much useful guidance for potential test revisers in that the MMPI-2 committee essentially revised the MMPI with the primary goal of developing a stronger instrument for their ongoing research, without participating in possible financial rewards for conducting the revision. That is, the Revision Committee chose to forgo royalties from the revised instrument.
Each potential reviser and each test publisher will have different motives and expectations from a test revision. It behooves the participants to air those expectations and responsibilities lest difficulties of a pecuniary nature entangle and disrupt the endeavor. One issue that Campbell addressed, and would be underscored with the MMPI revision, is that it is important that one person be given the responsibility of making decisions when there are disagreements about the way to proceed. Although theoretical and practical disagreements did occur on the MMPI project, no disagreements occurred about financial arrangements. Two requirements were stipulated by the MMPI Revision Committee in the beginning of the revision: (a) It was important that the MMPI-2 scoring keys be open and public. (There had been some precedent in the testing industry for test publishers to keep the scoring keys for instruments secret in order to protect them from being copied.) The MMPI committee believed that the instrument was a research tool "for the field" and as such it was considered important that open and available access to the scoring keys be maintained. (b) The MMPI committee strongly recommended (and the test publisher, the University of Minnesota Press has since followed) that a portion of the test revenues be allocated for future research on the instrument.
Gauge Potential Reaction to Alterations
It is important that test revisers try to foresee problems that might result from any proposed alterations insofar as ultimate professional acceptance of the revision can be assured. Alterations, even minor ones, can have an impact that was originally unintended.
Campbell (1972)
related the following problem that occurred with changes made to the number of items on the SVIB:
The first example is the number of items on the test. When a test is changed, the new form should have a different number of items so that it can quickly be distinguished from the older form, especially when answer sheets are involved. In the initial planning of the men's revision of the SVIB, we added 5 items (for a total of 405) to make the new form unique. After those plans became public, I received an anguished letter from a user stating that if the new test had 405 items, it could no longer be scored on the IBM-805 scoring machine because that answer sheet takes a maximum of 400 items. That did not seem to be a major problem since most people do not use the IBM 805 anymore and, anyway, answer sheets are flexible so that a few more items can always be added. The problem loomed larger when I found that 80,000 IBM answer sheets had been sold the preceding year, so clearly someone was using them, and the problem became hopelessly complicated after I spent an afternoon trying to cram five more items onto the answer sheet and finally concluded that it was impossible. (I found out much later that the reason that the original SVIB had 400 items was because that was the maximum number that could be fitted onto that answer sheet.) Since we were at a point where we could still make changes, we dropped one item, rather than adding five. (pp. 121122)
A similar problem with an unexpected result from one small change in the MMPI booklet was unearthed early in the test revision. One of the changes that the MMPI committee considered a "no brainer" that would immediately improve the instrument was to shorten it by dropping the 16 repeated items from the original booklet. These items had been included in the original test booklet in order to facilitate scoring on an earlier version of the instrument and are not used in clinical interpretation. The intention to drop these items as a space saving move was announced early in the revision at a national meeting of the Symposium on Recent Developments in the Use of the MMPI. We immediately received two protests to dropping the repeated items because they were used in the testretest (TR) index as a measure of response consistency. However, the fact that the 16 repeated items were used by a small number of researchers was not very persuasive, particularly since plans for the revision included developing stronger consistency measures. Eliminating the repeated items improved the instrument and did not adversely affect the development of response consistency measures because there still remained in the item pool a sufficient number of items with similar or opposite meaning. This change, in the end, did not result in a loss of a significant measure because two, more useful, consistency scales (VRIN and TRIN) developed by Auke Tellegen (1988) were published in the revision. However, dropping the 16 repeated items eliminated a frequent complaint from test takers that we were trying to "trip them up" by repeating some items.
This situation did serve to alert the MMPI Revision Committee of the need to gauge the potential impact of changes on research or clinical practice, where possible, and to secure clear empirical support to justify any changes that were to be implemented. Assuring research justification is crucial if major shifts or broad changes in interpretation are likely to come from the alterations that are proposed. Before the MMPI revision was undertaken, a number of "focus group discussions" were held with many researchers to determine the impact of possible changes. In addition, the test publisher (University of Minnesota Press) conducted several surveys of test users and researchers, before and after the revised version was published, in order to gauge the impact of some specific changes that were undertaken, such as the elimination of the so-called subtleobvious scales and the timing of the phase-out of the original version of the test.
Commit to Necessary Changes and Modernize the InstrumentFix all the problems that are fixable even though they may seem unimportant at first glance. If major problems persist after the revision is published, there will surely be an adverse reaction and potential stumbling blocks to the broader acceptance of the updated form. An instrument like the MMPI that had been around so long and had so many contributors to its research base had a number of unanticipated nuisance problems that came to light during the revision. Problems surfaced with the instrument that, while not fatal, did result in a less effective instrument. For example, toward the end of the MMPI revision, after the new norms had been generated and during the development of a set of subscales for the Si scale ( Ben-Porath, Hostetler, Butcher, & Graham, 1989 ), it was discovered that the original Si scale contained two items that were traditionally misscored, that is, they were being scored all along in the wrong direction! It was considered necessary to fix this problem even though it meant disturbing some original scoring keys and, of course, recomputing the T scores for the new norms.
Not all problems in a complex instrument are, however, fixable. It is important to realize that, because of the nature of the test, there may be some lingering troublesome elements in the instrument that cannot be addressed in a revision because they would require a "new test" rather than a revised version. For example, in the MMPI revision research some questions were raised about the value of the K correction that had been added to "improve" the assessment of five of the MMPI clinical scales since the mid-1940s ( Weed, 1992 ). Empirical studies showed that the K correction developed by Meehl and Hathaway (1946) did not actually improve the validity of the predictions from the scales that were routinely K-adjusted. Clearly, this finding was troublesome for the MMPI committee because the potential impact of simply doing away with this traditional scoring correction in the MMPI-2 could have greater consequences. The fact that the K correction had been applied to the vast majority of the traditional validity studies and, if deleted, might negate the results of the revised scales was a sobering thought. However, the use of K did not appear to make the predictions substantially less valid. It was therefore concluded that since one goal of the revision was an on-course correction, rather than a major revamping of the clinical scales, the traditional clinical scales needed to remain as close as possible to the original versions. However, to promote future research on the problem, the committee decided to encourage the development of non-K corrected validity studies that could be applied in future research by making available in the test manual non-K corrected T scores. In addition, a non-K corrected profile form (based on norms developed without K corrections) was provided for psychologists to examine profiles without the K correction being added.
Changes in test stimuli are often required in any revision. All one has to do is to look at the TAT cards or read through some of the items on the original MMPI to get a feel for the importance of updating psychological test stimuli periodically. Many assessment stimuli are time bound and experience a deterioration of meaning or relevance over time because of language, living styles, or social practice changes making the original stimuli appear quaint. People being asked to respond to a lot of questions on a psychological test that are out of date may not take the task very seriously.
The item-level improvements that were implemented with the MMPI had several important effects. The items in the revised form became more readable and less objectionable, and the content became more appropriate to a broader population. For example, dropping items like "I believe in the second coming of Christ" or "I enjoyed reading Alice in Wonderland" made the instrument less idiosyncratic. For example, one item on the original form that was dropped in MMPI-2 was the item "I like Lincoln better than Washington." This item was problematic in that it was very difficult to translate into other languages and showed biased results in various samples in the United States. For example, Erdberg (1970) found that this item completely separated African Americans from Whites in a study of BlackWhite differences on the test in Alabama. However, the item was not very effective at detecting personality differences, and it was not scored on any major MMPI scale! Dropping the item from the inventory did not affect any major scale.
Many of the item-level improvements implemented in the revised version produced a more appropriate instrument for diverse populations. Another interesting consequence of improving the wording and reducing the culture-bound elements in the test was that the MMPI-2 item pool became much easier to translate into other languages ( Butcher, 1996 ).
Test New or Altered Stimuli in PretestsIf items are changed or substituted on the revised version of an instrument, then it is crucial to empirically explore the impact of these changes before the revised version is released. For example, the MMPI Revision Committee conducted an evaluation of wording changes on the revised MMPI by administering the instrument to samples of known clients (airline pilot applicants) to determine if the revised booklet produced different results than the original item wordings. Results indicated that wording changes did not alter meaningonly the ease of reading the items. In the case of the MMPI-A, item changes were evaluated in a field study. Adolescents from a private school served as "project consultants"they were given alternative wording of cumbersome items and asked to determine the readability and acceptability of the item changes ( Williams, Ben-Porath, & Hevern, 1991 ).
Choose the Most Generalizable Normative ApproachApplications of a psychological test are limited by the normative basis of the instrument. Developing the instrument according to the broadest normative populations is important for user acceptance and to assure utility across a range of applications. Some instruments have been developed for specialized purposes (e.g., the Millon Multiaxial Clinical Inventory [MCMI]; Millon, 1994 ) and contain "norms" that are based on responses from a narrow normative sample (i.e., from a psychiatric sample rather than on responses from the general population). If the instrument's use is expanded for broader assessment into wider applications (such as with personnel selection or forensic assessment), then the norms for the instrument would not apply. The MCMI has limited utility when used with nonpsychiatric samples or for broad assessment questions. For example, normal persons are found to be "pathological" on the MCMI norms because these norms do not distinguish normals from nonnormalsthey assume that everyone taking the test is a psychiatric patient. If one is developing an instrument that will be used in a broad range of applications, it is advisable to employ norming procedures that do not unduly limit the generality of the instrument.
In developing a normative sample, potential test revisers should avoid shortcuts that might make data collection easier but produce a weaker normative base for the instrument or result in a measure that requires limited interpretation or use. For example, one recent personality scale, the Basic Personality Inventory ( Jackson, 1989 ), was developed using procedures that substantially limit the generalizability and reduce the confidence in the utility of the test. The norms for the Basic Personality Inventory were collected by mailing the test booklets out to a sample of possible normative test subjects, with $1 attached, requesting that they fill them out. No effort was made to sample diverse ethnic group membership or to provide a controlled testing environment for the administration of the test. Moreover, each participant was asked to respond to only one third of the items in the total item pool. Thus, participants did not respond to the entire item pool so that any scale statistics, such as alpha coefficients for scales, would be more difficult to interpret because the participants have not responded to all the items.
The MMPI Revision Committee considered it important in the revision to develop norms that were nationally based, randomly drawn from the community, and obtained in well-controlled testing sessions. The sample was balanced for ethnic group membership and well represented the national census. The MMPI-2 normative sample was drawn from a broadly diverse sample that clearly approximated characteristics of the national census ( Schinka & LaLone, 1997 ; Shaaffer, Erdberg, & Haroian, 1998 ). Interestingly, the norms for the MMPI-2 have been found to apply well with diverse ethnic samples in the United States ( Ben-Porath, Shondrick, & Stafford, 1994 ; Ellertsen, Havik, & Skavhellen, 1996 ; McNulty, Graham, Ben-Porath, & Stein, 1997 ; Timbrook & Graham, 1994 ; Velasquez et al., 1997 ) and with international populations.
Evaluate the Measuring InstrumentsIt is crucial in a test revision to accumulate a broad variety of data on the populations for which the instrument is intended to be used. There can never be enough field testing of the revised instrument. It is absolutely essential to provide basic psychometric information, such as scale reliability, factor structure, and so forth, for relevant samples, in order to anchor the instrument in psychometric "space." Test users expect to be provided basic psychometric statistics on the instrument as well as evidence of congruence (or lack) with the earlier instrument. As with any new test, information about the external validity of the revised scale needs to be provided. For example, in the MMPI revision a large sample of couples (over 800 couples) from the normative population were also asked to rate each other on 110 personality characteristics. These served as one source of external validity against which the clinical scales and new scales could be tested out ( Butcher et al., 1989 ).
The MMPI Revision Committee approached the matter of justifying changes by an extensive data collection on many samples in addition to the normative population. In all, over 15,000 persons from clinical and normal range populations were evaluated with the MMPI-2 or MMPI-A during the decade of redevelopment of the instrument and before the revision was released. For example, empirical studies were conducted on pain patients ( Keller & Butcher, 1991 ), psychiatric inpatients ( Ben-Porath, Butcher, & Graham, 1991 ), likely child abusing mothers ( Egeland, Erickson, Butcher, & Ben-Porath, 1991 ), alcoholics ( Weed, Butcher, Ben-Porath, & McKenna, 1992 ), couples in marital therapy ( Hjemboe & Butcher, 1991 ), airline pilot applicants ( Butcher, 1994 ), older individuals ( Butcher et al., 1991 ), military personnel ( Butcher, Jeffrey, et al., 1990 ), and college students ( Butcher, Graham, Dahlstrom, & Bowman, 1990 ).
Develop the New ScalesAs noted earlier, newer effective methods might be employed in a test revision to supplement measures in the original instrument. Empirical scale construction methods, used in the development of the original MMPI clinical scales, came under criticism because the scales developed tend to be psychometrically complex and heterogeneous ( Loevinger, 1972 ; Norman, 1972 ). Therefore, during the MMPI revision several scales were developed following an entirely different strategy for the MMPI-2 item pool. Over the past 30 years a substantial amount of research ( Wiggins, 1966 ) and theoretical discussion ( Burisch, 1984 ) has provided a strong impetus for the development of content-based scales. The MMPI-2 content scales ( Butcher, Graham, Williams, & Ben-Porath, 1990 ) were published following a combined rationalempirical scale construction strategy. These scales have been found subsequently to perform well as empirical predictors of external behavior and to provide important content themes ( Ben-Porath, Butcher, & Graham, 1991 ; Ben-Porath, McCully, & Almagor, 1993 ).
As long as the client is cooperative with the evaluation, content-based scales are effective predictors of criterion behaviorequal or better, in terms of predictive power, than the empirical scales of the MMPI-2. Therefore, it is important to have a clear picture of the client's test taking attitudes and cooperativeness with the evaluation when the instrument is used in clinical assessment (or in the development of scales in the first place, for that matter).
Evaluate Response SetsGiven that not all people who are selected to serve as participants in a standardization study are cooperative, it is critical for test revisers to have an effective means of assuring participant cooperation. Clear and unambiguous directions and well-controlled and monitored testing sessions can go a long way toward assuring quality standardization data.
Some psychological tests also incorporate measures for assessing protocol validity. The MMPI-2 contains a number of validity scales to detect invalidating conditions. For example, the F(B) scale was developed in the MMPI Revision Project as a means of detecting random responses or "mixed-up responding" toward the end of the booklet. A number of new validity scales were developed for the MMPI-2 that cover an expanded range of possible invalidating conditions. The availability of an expanded array of invalidity indices ( Arbisi & Ben-Porath, 1995 ; Butcher & Han, 1995 ; Tellegen, 1988 ) had the effect of stimulating a large number of studies to explore the utility of the MMPI-2 to detect invalidating conditions in personality assessment.
Develop a Detailed Test ManualKey to gaining acceptance of the revised instrument is the publication of a comprehensive, detailed test manual. The test manual for a revised version of a psychological test has multiple purposes. As for any psychological test, the manual needs to provide evidence with respect to the delineation of scale constructs, administration and scoring procedures, psychometric properties and internal scale relationships, evidence of test validity, and examples of how the instrument is used and interpreted.
The manual for a revised instrument also needs to be tied to the original test that it is replacing. Commonalties between the two measures need to be identified so that the test user can relate his or her past experiences with the test to the present version. Additionally, deviations from the "old way" of doing things also need to be spelled out. For example, any variations in the administration and scoring, alterations in test stimuli, interpretive strategies, and so forth need to be highlighted so that current users can visualize the operation of the new version.
Some people will, for a time, hold on to the earlier standard even though improvements make the revised instrument much better than the original and even when most people have adopted the new version. Psychologists who are placed in the role of "revisionists" for any widely used psychological test need to be aware that not everyone likes change. Some people, by nature, abhor change or modifications in their external world if they have to alter their practice or research substantially. Changes in a beloved and relied-upon test can be an imposing event for some professionals.
Odell Shepard (1929)
pointed out in
Joys of Forgetting:
There are people who not only strive to remain static themselves but strive to keep everything else so, and weep like Heraclitus to find that nothing ever stands still to be studied, understood, and described. Their grievance against the world is that it insists upon changing at every moment and destroying all their categories. Who that has lived at all has not sympathized with them at one time or another? And yet their position is almost laughably hopeless. (p. 146)
As noted earlier, even the most substantial and high-quality test revision will have its detractors because of the resistance some people have to change. Critical detraction can result from several reasons: blind loyalty to the earlier version of the instrument, financial ties to the earlier version, resistance to change or novelty, or heavy commitment to an earlier version (e.g, "I have my file cabinets filled with original test booklets and answer sheets"). Such criticisms of a revised test are neither easy to predict nor possible to circumvent, because they may not be based on a careful evaluation of the revision itself but on more subjective, idiosyncratic reasons. It may not be possible to eliminate all such problems, and it is important to distinguish between well-founded criticism and simple "grousing" over the situation.
Yet, criticisms of the revision do not have to be substantiated in order to be aired and to call the revision into question. Some critics might find problems with the revision or voice criticisms even though they may have little or no basis in fact. For example, one vocal critic of MMPI-2 complained that "the validity of the Pd scale was less in MMPI-2 than in the original MMPI." Yet, this criticism was clearly unfounded because the Pd scale is exactly the same composition in the revised MMPIno items dropped or none addedtherefore, the ability of the new version to predict external behavior would be exactly the same as the original version of the instrument.
Opinions, even groundless ones, can sway perceptions on a temporary basis. However, a well-conceived and constructed test revision will win out over the long term. It is important in a test revision to do the most careful job possibleand to have the conviction to make necessary changes even though some criticism may follow.
Some psychologists who have revised well-known tests have been criticized for changing the test. People handle such criticism in different ways. Our view on the MMPI project was that we would "let the data speak for themselves." If the revision is a strong one, then the criticisms will abate as people become aware that the changes are backed by substantial information.
Develop a Critical Phase-Out Period for the Superseded VersionThe test revision process is not complete until the earlier or original version of the instrument has receded into history. There were very clear reasons for continuing the original MMPI for a period of time along with the revised instrument. The MMPI-2 Revision Committee (because of the extensive use of the MMPI) held the view that the original MMPI would likely be used for some time because it was so tied to clinical practice. In addition, the MMPI-2 (for adults) was published in 1989 while the MMPI revision for adolescents was still in progress and did not get published until 1992. The MMPI-2 did not include adolescents in the normstherefore, it was not recommended for teensand so the original version (particularly since there were adolescent norms available) was still recommended for this population. The MMPI committee held the general view that this period would be about 5 years, or roughly 3 years after the adolescent version of the MMPI was published.
Having two "standards" for the same instrument can be problematic for both clinical and research applications. For example, if the instrument is widely used in court-related evaluations, then both sides of the court case in the adversarial legal system could employ different versions of the standard in their assessment and produce results that might appear to differ substantially. The existence of both the original and revised forms of the MMPI has created some confusion, particularly for applications involving forensic evaluations. The test publisher, with substantial review of existing research by its psychological consultants, determined that it is in the best interest of the field of psychological assessment to phase out the original version.
What is the professional standard for determining when the old test should recede into history? There are no clear guidelines for determining when a test has been superceded. However, psychologists are encouraged to use the most current version of a psychological test. The American Psychological Association suggested the following:
A test should be amended or revised when new research data, significant changes in the domain represented, or new conditions of test use and interpretation make the test inappropriate for its intended uses. An apparently old test that remains useful need not be withdrawn or revised simply because of the passage of time. But it is the responsibility of the test publishers to monitor changing conditions and to amend, revise, or withdraw the test as indicated. ( American Psychological Association, 1996 , Standard 3.18)
Although most MMPI users shifted over to the MMPI-2 in 1989 and the MMPI-A in 1992, a smaller number of psychologists (less than 5%) continued to use the original version 9 years after the MMPI-2 became available. A few somewhat vocal dissenters continued to employ the original norms in critical assessment situations such as court casesa situation that created confusion as to what the "true" MMPI standard should be. This confusing situation led the test publisher to withdraw the original MMPI from service as of September 1, 1999.
Provide Educational Training or Briefings on ChangesOne of the most effective means of assuring that practitioners and researchers will transition to the revised version of a test is for the revision team to conduct practical workshops or briefing sessions for test users to explain the changes in the revised version and continuities with the original instrument. Even before the MMPI was published an extensive series of workshops or briefing sessions were conducted to inform test users about the new form. These educational programs were influential in informing test users as to the important adjustments that needed to be made, as measured by the fact that within 6 months of publication over 80 percent of MMPI users had switched to the revised MMPI.
Another way in which test publishers can facilitate the transition to a revised version of a psychological test is to implement an exchange or "buy back" program to help practitioners obtain the newer version without undue costs. This program involves replacing stock of the earlier version with the revised form at a lower cost.
Many psychological tests require updating if their timeliness and effectiveness are to be maintained. Revisions of a widely used psychological test can be a daunting taskand one that can take great effort and resources if it is to be done properly. Key elements in any revision are (a) develop a practical program and collect an ample base of supporting data to justify the changes that are made in the revision, (b) communicate clearly to test users what changes have been implemented and what are the continuities with the original instrument, and (c) conduct a series of accessible continuing education programs across the country to inform practitioners of the changes and modifications made.