The depth of fraud historically has been documented (though still quite incompletely) by journalists William Broad and Nicholas Wade in their 1982 Betrayers of the Truth (includes an appendix summarizing 34 cases). But careful examination of these cases can also pose some provocative questions about "proper" science. Consider, for example, the classic case of Gregor Mendel, whose published data on inheritance in pea plants, according to statistician Fisher, were too good to be true. Mendel's results were one in a million chance. Some defend Mendel, though, saying that he followed contemporary practice: to repeat experiments, refine own's technique, and then use only the best results as the most representative ones. If that is not legitmate now, why not?: what does this reveal about how we evaluate evidence? It is worth noting for students, in fact, that the standards themselves have changed: why?
A question worth posing for student discussion (here, and below) is:
The case of Robert Millikan, whose renowned oil-drop experiment established the value of the fundamental unit charge, e--and earned him the Nobel Prize in 1923--is far more provocative.
- If a scientist gets the "right" answer, does it matter if the data were "tweaked," "massaged," distorted, or even wholly fabricated?
Millikan, of course, kept detailed notebooks of his laboratory activities, data and assessments of results. Several years ago, an effort to reconstruct Millikan's "exemplary" experimental thinking revealed serious discrepancies between Millikan's notebooks and his published "raw" data (Holton, 1978). The numerous notes which are scattered across the pages cast further doubt on Millikan's integrity:
This is almost exactly right & the best one I ever had!!! [20 December 1911]Millikan had apparently been calculating the values of e for each set of observations as he went along, and comparing them with his expected value. Further, he seemed to use the match with the theory that he was supposedly testing as a basis for including or excluding results as the very evidence for that theory! As Franklin (1986) has noted, "we are left with the disquieting notion that Millikan selectively analyzed his data to support his preconceptions" (p. 141; echoing Holton 1978). Are we to conclude that Millikan's analysis, laden with theoretical bias and which seems to treat experimental facts so casually, reflects the nature of scientific "genius"?
Exactly right [3 February 1912]
Publish this Beautiful one [24 February 1912]
Publish this surely / Beautiful !! [15 March 1912, #1]
Error high will not use [15 March 1912, #2]
Perfect Publish [11 April 1912]
Won't work [16 April 1912, #2]
Too high by 1½% [16 April 1912, #3]
Too high e by 1¼%
The notebooks reveal that, indeed, substantial data are missing from Millikan's published reports. Of 175 total drops documented in the notebooks, only 58 (barely one-third) appear in the final paper. By contrast, Millikan had announced in his 1913 paper that "It is to be remarked, too, that this is not a selected group of drops but represents all of the drops experimented on during 60 consecutive days, during which time the apparatus was taken down several times and set up anew" [his own emphasis!]. In his 1917 book, The Electron, he repeats this statement and then adds, "These drops represent all of those studied for 60 consecutive days, no single drop being omitted."
At first blush, this outrageous violation of scientific integrity would seem to discredit Millikan's findings. Even if one assumes that standards of reporting data earlier in the century were less rigorous, Millikan clearly misrepresented the extent of his data. One may caution students, however, that we may not want to conclude that therefore there was no good, "scientific" basis for his selective use of data. A more complete analysis of Millikan's notebooks, in fact, and of the nature of the experimental task that they crudely document, reveals more tellingly the reasons that Millikan included some drops and excluded others.
Physicist-philosopher Allan Franklin has addressed the problem by using Millikan's original data to recalculate the value of e. Even when one uses various constellations of the raw data, Millikan's results do not change substantially. That is, their accuracy was not severely affected by Millikan's choice of only a subset of the observations. Millikan's selectivity, at most, gave a false impression of the variation in values or the range of "error" in the data and, therefore, of the statistical precision of the computed value.
In fact, Franklin notes, Millikan threw out data that was "favorable" as well as "unfavorable" to his expectations. Clearly, Millikan's results were over-determined. That is, he had more data than he needed to be confident about his value for the electron's charge. Here, the redundancy of data was an implicit method for safeguarding against error. Thus, what appears as fraud from one perspective becomes, from an experimental perspective, a pattern of good technique.
One may examine further specifically when the observations that Millikan excluded occurred. The first 68 observations, for instance, were omitted entirely. Why? Following February 13, 1912 (which marks the first published data), one may also note, the number of excluded results decreases as the series of experiments proceeds. Apparently, Millikan became more skilled as time went on at producing stable, reproducible data. Prior to February 13th, one may infer, he was still working the "bugs" out of the apparatus and gaining confidence in how to produce trustworthy results. That is, he was testing his equipment, not any theory of the electron or its charge. Here, the notebooks help focus our attention on the apparatus and the material conditions for producing evidence, not the role of the evidence itself.
Millikan's comments in the notebooks highlight the significance of experimental judgements, especially in excluding particular observations. For example, "Beauty Publish," on April 10, 1912 is crossed out and replaced by, "Brownian came in"; here, the way the drop had moved meant that his measurements did not reflect the values Millikan needed for his calculations--those which the apparatus, of course, was specifically designed to produce. Millikan's judgement about other aspects of the experimental set-up are revealed elsewhere:
This work on a very slow drop was done to see whether there were appreciable convection currents. The results indicate that there were. Must look more carefully henceforth to tem[perature] of room. [19 December 1911]Millikan had thus been concerned about several parameters critical for obtaining "good" or "clean" results, consistent with the design of the experiment: the size and symmetry of the drop; convection currents (temperature of room); smoothness of movement of the drop; and (elsewhere) dust, pressure and voltage regularity (Franklin, pp. 149-50).
Conditions today were particularly good and results should be more than usually reliable. We kept tem very constant with fan, a precaution not heretofore taken in room 12 but found yesterday to be quite essential [20 December 1911]
Possibly a double drop [26 January 1912]
This seems to show clearly that the [electric] field is not exactly uniform, being stronger at the ends than in the middle [27 January 1912]
This is good for so little a one but on these very small ones I must avoid convection still better [9 February 1912]
This drop flickered as tho unsymmetrical [2 March 1912]
This is OK but volts are a little uncertain and tem also bad. It comes close to lower line. [7 March 1912, #1]
Even where he could not pinpoint the problem, he might sense that "something the matter . . ." [13 February 1912]. Millikan's confidence in his judgement meant that in some cases he did not even go on to calculate e, excluding those observations even before seeing the "results." In other cases, he recognized the "beauty" of the run:
Beauty. Tem & cond's perfect. no convection. Publish [8 April 1912]Millikan's decisions to publish data (or not) based on their "beauty" (above), therefore, probably reflected his assessments of the particular experimental conditions. His striking comment on February 27, 1912, "Beauty one of the very best," may thus refer, not to the value of e itself, but to the quality of his own technique.
Millikan excluded other events based on the methods of calculation. For example, the formula used a substituted value based on certain theoretical assumptions in Stokes's Law (relating pressure, air viscosity and drop radius). While Millikan tolerated the first-order "corrections" for the values, in 12 cases where unusual data required him to rely on less certain second-order corrections, he simply omitted the events. In other words, not all data was "user-friendly"--that is, tailored to the framework for drawing legitimate conclusions.
Millikan was also able to exploit the fact that the value of e could be calculated in two ways, each using slightly different measurements of the same event. He allowed the two methods to cross-check each other. In some cases, he noted:
Agreement poor. Will not work out. [17 February 1912, #3]Again, where he found discrepancies, he was better off avoiding the possible uncertainties by simply sidestepping the "unworkable" events. By the end of the experimental period, one can sense that Millikan, having more than enough data, was continuing his work merely to build confidence about all his safeguards. Three days before he stopped taking observations, he satisfied himself, "Best one yet for all purposes" [13 April 1912]. Two days later, the very day before ending, he recorded:
Error high will not use. . . . Can work this up and prob is OK but point is not important. Will work if have time Aug. 22 [15 March 1912, #2]
Beauty to show agreement between the two methods of getting v1 + v2 Publish surely [15 April 1912]An aim of internal consistency, rather than agreement between theory and data, clearly guided Millikan's work.
Even the final values of the calculations could themselves be clues or signals that something was amiss. One erratic value of e--clearly outside the boundary of typical or "reasonable" values, or of anything else he had found to date--prompted Millikan to decide: "could not have been an oil drop" [20 December 1911 #3], and to conclude apparently that it was a dust particle. Millikan excluded two other important drops that gave anomalous values of e, even though one, by Millikan's own judgement, was a model of consistency. Having begun with some confidence:
Publish. Fine for showing methods of getting v [16 April 1912, #2]he later marked in the corner of the page (without further accounting), "Won't work." In retrospect, Millikan's intuition seemed to have served him well: we know from data in Millikan's notebook that these two drops had unusually high total charges and that such drops (as we have learned since 1912) are not reliable using the method that Millikan used. Here, again, Millikan's primary reasoning concerned whether to trust the apparatus and his experimental measurements--not (yet) whether the theory or value of e itself was correct.
The use of Millikan's oil drop experiment in class labs can easily suggest to students that it was quite trivial--what with a novice being able to reproduce the work of a Nobel Prize winner, after all! The current standardization of the experiment disguises, though, the complexity of the context in which it developed. Conceptually, the task in the early 1900s was relatively clear. Indeed, Millikan's experimental strategy in 1910-1912--to observe drops of fluid, each laden with charge, moving in an electric field--had been tried by many researchers before. The chief difficulties at the time lay in the mechanics of constructing the situation idealized by theory. Millikan's ultimately successful strategy differed from others by focusing on single drops and by substituting water with oil, which did not evaporate so easily and thus made more sustained observations possible. That is, Millikan's achievement, marked by the Nobel Prize, was largely technical.
An analysis of Millikan's notebooks, therefore, highlights a grey zone between outright misrepresentation of data and skilled experimental "micro-reasoning." Was Millikan's selective use of data "good" science? One may contrast Millikan and his success, in this case, with his critic, Felix Ehrenhaft, who stubbornly resisted discarding the results of any run. Was Ehrenhaft's experimental posture appropriately conservative or unduely myopic? Was Millikan, likewise, inexcusably dishonest or justifiably pragmatic?
The question of whether editing of data can represent good science is obviously aggravated by cases where it has failed to yield reliable conclusions. Stephen Jay Gould (1981, pp.56-60) notes that in studying the relative cranial capacity of Caucasians and "Indians," a 19th-century investi-gator excluded many Hindu skulls...but for "good" reasons? The "Hindoo" braincases were too small and, because they were "clearly" unrepresentative of the Caucasian population he wanted to sample, they would "bias" his results. Here the effect of the selection was probably not even conscious. Like-wise, anthropologists in the same era, evaluating women's skulls, relied on their "intuitions" to disregard types of measurements that suggested that women (or elephants, whales or bear-rats) were more intelligent than men. So, can one know where selection is legitimate, and where not?
The cases of Millikan and Mendel illustrate, in particular, that in answering such a question, we must focus on experimental skills and judgement (and on apparatus) as much as on the concepts themselves. While this is the potential "lesson," though, the problem that sparks the inquiry may be the spectre that fraud is the very tool of genius.
The SHiPS Teachers' Network helps teachers share experiences and resources for integrating history, philosophy and sociology of science in the the science classroom.