Psychological Assessment: A Journal of Consulting and Clinical Psychology © 1990 by the American Psychological Association
September 1990 Vol. 2, No. 3, 281-285
For personal use only--not for distribution.

Failure of Wiener and Harmon Minnesota Multiphasic Personality Inventory (MMPI) Subtle Scales as Personality Descriptors and as Validity Indicators

Nathan C. Weed
University of Minnesota
Yossef S. Ben-Porath
University of Minnesota
James N. Butcher
University of Minnesota
ABSTRACT

Prototypical Minnesota Multiphasic Personality Inventory (MMPI) correlates in the form of spouse ratings were used as criteria to evaluate the validity of the Wiener and Harmon MMPI subtle scales for describing personality and for indicating profile validity. Results from a normative sample n = 1,682 and a marital counseling sample n = 369 indicated that the addition of the subtle scales to the obvious scales attenuates validity to the same degree as the addition of a random variable. Likewise, results did not support the use of an index based on MMPI subtle scales designed to detect overreporting or underreporting of psychopathology. These findings are discussed in terms of their relevance to clinical assessment.

Many practitioners and researchers in personality assessment have been intrigued by "subtle" items. In recent years, there have been several theoretical formulations of item subtlety ( Christian, Burkhart, & Gynther, 1978 ; Holden & Jackson, 1979 ; Ward, 1986 ) and a commensurate interest in the use of MMPI subtle scales in clinical practice as descriptors of personality, which are believed to be less susceptible to response bias due to their subtle nature ( Greene, 1980 ). The chief reason for developing or using subtle measures is the belief that the assessment will be accomplished without the subject knowing he or she is being assessed on the construct in question. An important assumption underlying the use of subtle scales is that the subtle measure assesses the construct in question, as well as other, more obvious items, yet does not require the individual's cooperation in the process of self-report.

The most frequently used subtle scales are those constructed by Wiener and Harmon. The Wiener and Harmon scales were developed by dividing the MMPI items into two groups ( Wiener, 1948 ): one consisting of items that are relatively easy to detect as indicating disturbance (obvious) and the other consisting of items that are ostensibly difficult to detect as indicating disturbance (subtle). From this grouping, Wiener and Harmon constructed subtle and obvious scoring keys for five of the MMPI clinical scales, Depression ( D ), Hysteria ( Hy ), Psychopathic Deviate ( Pd ), Paranoia ( Pa ), and Hypomania ( Ma ). In comparison to the full scales, intercorrelations among scores on the obvious scales tend to be higher, and intercorrelations among scores on the subtle scales tend to be lower ( Wiener, 1948 ). Also, estimates of internal consistency tend to be highest for the obvious scales, somewhat lower for the full scales, and lower still for the subtle scales ( Butcher, 1989 ).

Despite popular use of these scales in clinical practice, in empirical studies examining the validity of subtle scales, the bulk of the evidence fails to find that MMPI subtle items contribute much validity to the full MMPI scale over and above the contribution of the obvious items (e.g., Burkhart, Gynther, & Fromuth, 1980 ; Duff, 1965 ; McCall, 1958 ; Nelson, 1987 ; and Wrobel & Lachar, 1982 ). Studies with mixed results have been reported by Hovanitz, Gynther, and Marks (1983) and Hovanitz and Jordan-Brown (1986) .

However, research on the efficacy of subtle items has tended to have limited applicability for a number of reasons, including (a) characteristically low sample sizes; (b) validity criteria used that are often irrelevant to the clinical task (i.e., other inventories); or (c) validity coefficients computed between MMPI scales and self-report criteria that capitalize on method covariance (see Ward, 1986 ).

In this study, we avoided these problems by using larger contemporary samples of "normals" and patients and incorporating appropriate behavioral correlates as external criteria. In this investigation, spousal ratings of prototypic MMPI correlates are used as criteria against which subjects' Wiener and Harmon subtle- and obvious-scale scores are compared. The first goal of this study was to evaluate the Wiener and Harmon subtle scales as personality descriptors. If subtle scales are to be used for this purpose, scores on these scales should not only correlate well with the spouse ratings but should also contribute to the full scale above and beyond the contribution of the obvious-scale scores. Furthermore, the validity coefficients for the full-scale scores should exceed validity coefficients obtained by correlating spouse ratings with a composite of obvious-scale scores and a random variable.

The second goal of the study was to evaluate the Wiener and Harmon subtle scales as validity indicators. Greene (1980) described an index that uses the subtle scales as profile validity indicators. By subtracting the sum of the Wiener and Harmon subtle-scale T scores from the sum of the obvious-scale T scores, one obtains a score that at high-positive levels is thought to indicate the overreporting of psychopathology and at low-negative levels is thought to indicate the underreporting of psychopathology. If this index is sensitive to such response bias, then we would expect high correlations between spouse ratings and MMPI scales at intermediate levels of this index and lower correlations (due to inaccurate reporting) at extreme ends of the distribution of this index.

Method

Subjects

The two samples used in the study were an MMPI normative sample and a marital counseling sample.

MMPI—2 normative sample.

As part of the MMPI—2 normative study, 841 couples were included in the sample ( Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989 ). These couples were randomly solicited and tested in small groups. In one 3-hr session, they were administered the MMPI (Form AX, used in the MMPI Restandardization Project) and a spouse-rating form. Each couple was paid $50.00 for its participation. The majority of this sample (57%) was between the ages of 25 and 44 years, with 35% being 45 years or older. Their median annual family income was between $30,000 and $35,000, with a median of 15 years of education. The racial composition of the sample was 87% White, 8% Black, 2% Native American, 1% Asian, and 1% Hispanic.

Marital counseling sample.

Hjemboe and Butcher (1989) obtained this group N = 369 from clinics in Minnesota and Arizona, where the subjects were being seen in marital therapy. The couples took the MMPI and provided ratings of their spouses as part of their treatment program. This sample was somewhat younger and less ethnically diverse than the normative sample of couples, with more education and higher incomes. Only 20% of the sample was 45 years or older, with 73% being between the ages of 25 and 44. The median annual family income was between $45,000 and $50,000, with a median of 16 years of education. The racial composition of the sample was 94% White, 3% Native American, 2% Black, and less than 1% each Asian and Hispanic.

For the first part of the study (evaluating the subtle scales as personality descriptors), the following exclusion criteria were used (expressed in raw scores): ? > 30, or L > 10 , or F > 21 , or K > 26 . In the normative couples sample (which had been subjected to some initial exclusion criteria; Butcher et al., 1989 ), 13 subjects were further excluded by these criteria. In the marital counseling sample, 7 subjects were excluded by these criteria. Because the purpose of the second part of our study was to test the utility of a potential validity indicator, the only further exclusion criterion used was ? > 30, eliminating just 1 subject from the normative couples sample. (The marital counseling sample was not used for this part of the study, because we reasoned that division of it into subsamples would yield inadequate sample sizes.)

Instruments

In addition to the MMPI Form AX (the experimental form from which MMPI—2 emerged), each subject in both samples completed a spouse-rating form. This form consisted of 110 four-point items on which the subject was asked to rate the degree to which a descriptive statement applied to her or his spouse. From these 110 statements, 43 were selected as prototypic correlates of the MMPI scales with Wiener and Harmon subtle scales (10, 2, 10, 11, and 10 items for scales 2, 3, 4, 6, and 9, respectively). Examples of these items appear in the Appendix A .

Procedure

For the first part of the study, for both samples, ratings of each of the 43 items were correlated with scores on four versions of the relevant MMPI scale: the raw score, the K -corrected T score, the raw score on the Wiener and Harmon subtle scale, and the raw score on the Wiener and Harmon obvious scale. In addition, a parallel analysis was conducted, comparing the performance of the subtle scales to the performance of random variables with characteristics similar to the subtle scales. For each subtle scale, three random variables with means and variances similar to those of the subtle scales in the normative couples sample were generated by computer. Actual means and standard deviations of subtle and "pseudosubtle" scales are presented in Table 1 . Individual "scores" on these pseudosubtle scales were rounded to the nearest non-negative whole number and added to scores on the Wiener obvious scales to create three "pseudoraw" scores for each MMPI scale. Each set of raw scores was then correlated with scores on the appropriate rating items.

For the second part of the study, the normative couples sample was divided into three groups (lowest quartile, highest quartile, and the remainder) on the basis of their score on a reporting index, described by Greene (1980) . This index was computed for each individual by subtracting the sum of the Wiener and Harmon subtle-scale T scores from the sum of the obvious-scale T scores, with low-negative scores presumably indicating underreporting of psychopathology and high-positive scores indicating overreporting. For each group, MMPI raw scores were correlated with the appropriate spouse ratings.

Results

Subtle Scales as Personality Descriptors

Our first step in examining the utility of the subtle scales in describing personality was to compare their relative validity with the validity demonstrated by other scales (i.e., the MMPI raw scores, the K -corrected T scores, and the obvious-scale scores). In the normative sample, of the 43 spouse rating items, scores on 35 items correlated in the expected direction with one or more of the versions of the relevant MMPI scale (26 with raw, 23 with K -corrected T, 3 with subtle, and 33 with obvious). An arbitrary validity coefficient (correlation between MMPI scale and rating item) of .10 was selected as a minimum value for consideration. The mean strengths of correlations between these 35 rating items and MMPI scales were .13 (raw), .11 ( K -corrected T ), .01 (subtle), and .16 (obvious). Thus, for these prototypic MMPI correlates, the obvious scales outperformed not only the subtle scales but the full-scale raw scores, of which the subtle items are only a subset, t r' 34 = 3.54, p < .001 ( Fisher, 1921 ).

The same pattern was found in the marital counseling sample. Of the 43 rating items, 36 were substantially correlated in the expected direction with one of the four scales (25 with raw, 23 with K -corrected T, 7 with subtle, and 32 with obvious). The mean strengths of correlations between the 36 ratings and the four versions of the MMPI scales were .14 (raw), .12 ( K -corrected T ), .01 (subtle), and .15 (obvious). Again, the obvious items alone performed significantly better than the full scale, t r' 35 = 1.76, p < .05 . 1

Our second step in examining the utility of the subtle scales was to compare the validity of the raw scores with that of bogus scales we call "pseudoraw" scores, composed of the sum of the obvious scale items and random variables with characteristics similar to the subtle scales. Mean correlations over the 35 items used in the normative sample were .13, .13, and .12 for the three sets of these scales. For each set of the pseudoscales, the validity coefficients were not significantly different from those associated with the actual full scale, t r' 34 = 0.84, 0.12 , and 1.51, respectively (all were nonsignificant).

Subtle Scales as Validity Indicators

The goal of the second part of our study was to examine the utility of the subtle scales as validity indicators. A simple test of the efficacy of any validity scale or indicator is to compare the external validity of scores judged valid by the indicator with those that are judged invalid. For the group scoring lowest on the reporting index described above ( n = 422 , M reporting index [RI] = -93), the mean correlation between spouse ratings and the appropriate MMPI raw score was .08. For the group in the middle half of the sample ( n = 845 , mean RI = -33), the mean validity was .10. For those scoring lowest on the reporting index ( n = 414 , mean RI = 46), the mean validity was .15.

If the reporting index was an effective indicator of validity, assuming the spouse ratings are not subject to these same reporting biases, one would expect the mean validities in the samples that scored in the extremes on this measure to be significantly lower than in the sample with normal reporting. This appeared to be true for the underreporting group. However, in the overreporting group, the mean relationship between criterion and MMPI score was higher than in the normal reporting group.

Because it remained possible that the overreporting group would still show poor validity when only the most extreme subjects were examined, the overreporting group was split into yet three more groups: high n = 138 , medium n = 136 , and low n = 140 overreporters. Validity coefficients for these three groups were .16, .13, and .15, respectively. Thus, even the most extreme 8% of the sample on this reporting index had MMPI raw scores that were more valid (by virtue of their relationship to prototypic MMPI correlates) than those who reported "normally" as defined by this reporting index.

Discussion

Our findings suggest several conclusions about the utility of the Wiener and Harmon MMPI subtle scales. First, these scales by themselves appear to show little or no association with prototypic MMPI correlates. This result is consistent with the findings of other researchers who used different subtle scales and a variety of external criteria (e.g., Duff, 1965 ; Burkhart, Gynther, & Fromuth, 1980 ; and Nelson, 1987 ). Furthermore, for both a marital-counseling and a normal sample, the MMPI obvious scales performed better than the full scales. This suggests that the subtle scales, which are subsets of the full scales, actually attenuate the validity of the full scale by introducing variance that is not related to variance in these criterion measures. This notion is corroborated by the finding that when random variance is added to the obvious scales, the resulting scales perform nearly as well as the full scales themselves.

Considering the results of the initial phase of this study, it appears unlikely that an index based on the subtle scales could be useful as a validity indicator. Because the reporting index (described earlier) compares individual performance on the subtle scales with that on the obvious scales, it assumes that they are equally valid (or that the subtle scales are more valid because they are not subject to response bias) in predicting the presence or absence of typical MMPI correlates. It seems likely that the violation of this assumption accounts for the results from the second phase of the study, which show that there is greater validity of the MMPI scales among overreporters than among normal reporters. The finding that the correlation between spouse ratings and the relevant MMPI scales increases with higher scores on the reporting index may suggest that higher scores on the reporting index represent the presence of more variance due to items on the obvious scales (valid variance).

Finally, the failure of the MMPI T scores, which include K correction, to improve the validity of the full scales is quite surprising, especially in the normative subjects. Such a finding certainly warrants further inquiry, because the K scale almost universally is used as a correction for response bias.

Although the results from this study leave little room for recommending the clinical use of the Wiener and Harmon subtle scales, some limitations should be identified. First, this study used the MMPI Form AX, which was an experimental form of the MMPI—2 containing modifications of antiquated or poorly worded items ( Butcher et al., 1989 ). Thus, strictly speaking, this study represents an evaluation of the MMPI—2 Wiener and Harmon subtle and obvious scales. There is, however, evidence that suggests that the psychometric properties of the MMPI—2 items have not changed from those of their predecessors ( Ben-Porath & Butcher, 1989 ).

Second, the samples used are not likely to contain as much extreme psychopathology as one might desire in attempting to generalize these results to other exclusively clinical populations. Although it might be argued that a difference in magnitude of the validity coefficients would exist in a sample with more psychopathology, there seems to be no a priori reason to expect an interaction between performance of subtle versus obvious scales and sample or between reporting index and sample. That is, a new sample is not likely to improve the validity coefficients associated with the use of subtle items without also improving the validity coefficients associated with the use of obvious items. Nevertheless, a replication using a more pathological sample would be prudent.

Third, because this was an attempt to evaluate these scales in the way they are likely to be used in practice, the subtle scales were evaluated together. There is some possibility that certain of the Wiener and Harmon subtle scales are more related to external criteria than others (e.g., Hypomania [ Ma -S]; Wrobel & Lachar, 1982 ). This is also the case for MMPI subtle scales developed by other researchers (e.g., Hovanitz & Jordan-Brown, 1986 ). However, such a claim could only be made after more research on individual subtle scales has demonstrated this utility.

Fourth, because the purpose of this study was to evaluate specific scales that are currently in use, no attempt was made to operationalize current theoretical notions of item subtlety (e.g., Christian, Burkhart, & Gynther, 1978 ). Thus, no theoretical statement based on these data can be made about the inherent utility of subtle items for test construction.

Finally, as in all clinical application of assessment research, the task required of the clinician should be considered. Although this study uses prototypical MMPI correlates often used in clinical description, if such description is not the focus of assessment, these results may not apply. Clinicians or counselors with special prediction problems may wish to conduct further research using appropriate criteria.

APPENDIX A
Examples of Spouse-Rating Items: Scale 9 Correlates



ppaa

References


Ben-Porath, Y. S. & Butcher, J. N. (1989). Psychometric stability of rewritten MMPI items. Journal of Personality Assessment, 53, 645-653.
Burkhart, B. R., Gynther, M. D. & Fromuth, M. E. (1980). The relative predictive validity of subtle vs. obvious items on the MMPI depression scale. Journal of Clinical Psychology, 36, 748-751.
Butcher, J. N. (1989, March). MMPI subtle items: Is there an empirical basis for their use? (Paper presented at the 24th Annual Symposium on Recent Developments in the Use of the MMPI, Honolulu, HI)
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A. & Kaemmer, B. (1989). Manual for the restandardized Minnesota Multiphasic Personality Inventory: MMPI-2. (An administrative and interpretive guide. Minneapolis: University of Minnesota Press.)
Christian, W. L., Burkhart, B. R. & Gynther, M. D. (1978). Subtle—obvious ratings of MMPI items: New interest in an old concept. Journal of Consulting and Clinical Psychology, 46, 1178-1186.
Duff, F. L. (1965). Item subtlety in personality inventory scales. Journal of Consulting Psychology, 29, 565-570.
Fisher, R. A. (1921). On the probable error of coefficient of correlation deduced from a small sample. Metron, 1, 3-32.
Greene, R. L. (1980). The MMPI: An interpretive manual. (New York: Grune & Stratton)
Hjemboe, S. & Butcher, J. N. (1989). Use of the MMPI—2 with couples in distress. (Manuscript in preparation, University of Minnesota, Minneapolis)
Holden, R. R. & Jackson, D. N. (1979). Item subtlety and face validity in personality assessment. Journal of Consulting and Clinical Psychology, 47, 459-468.
Hovanitz, C. A., Gynther, M. D. & Marks, P. A. (1983). The prediction of paranoid behavior: Comparative validities of obvious versus subtle MMPI paranoia ( Pa ) items. Journal of Clinical Psychology, 39, 407-411.
Hovanitz, C. A. & Jordan-Brown, L. (1986). The validity of MMPI subtle and obvious items in psychiatric patients. Journal of Clinical Psychology, 42, 100-108.
McCall, R. J. (1958). Face validity in the D scale of the MMPI. Journal of Clinical Psychology, 14, 77-80.
Nelson, L. D. (1987). Measuring depression in a clinical population using the MMPI. Journal of Consulting and Clinical Psychology, 55, 788-790.
Ward, L. C. (1986). MMPI item subtlety research: Current issues and directions. Journal of Personality Assessment, 50, 73-79.
Wiener, D. N. (1948). Subtle and obvious keys for the MMPI. Journal of Consulting Psychology, 12, 164-170.
Wrobel, T. A. & Lachar, D. (1982). Validity of the Wiener subtle and obvious scales for the MMPI: Another example of the importance of inventory-item content. Journal of Consulting and Clinical Psychology, 50, 469-470.


1

It should be noted that the same pattern of results is found when the two samples are combined. Mean correlations are .15, .13, .02, and .17 for raw scores, K -corrected T scores, subtle-scale scores, and obvious-scale scores, respectively. The obvious items alone once again perform significantly better than the full scale, t r' 35 = 2.78, p < .005 . Although most personality assessment is carried out in a within-group context, it is important to realize that this effect is neither limited to, nor an artifact of narrow range sampling. Rather, the superiority of the Wiener and Harmon obvious scales is likely to be found within and across a wide variety of samples.



We thank Steve Hjemboe for permitting us to use a large portion of his dissertation data and Wendy Slutske for her assistance with some of the data analysis.
Correspondence may be addressed to Nathan C. Weed, Department of Psychology, University of Minnesota, Minneapolis, Minnesota, 55414.
Received: August 30, 1989
Revised: December 1, 1989
Accepted: December 15, 1989

Table 1.