This chapter will describe the research conducted by cognitive scientists who study scientific thinking and reasoning. Surprisingly few of these studies have been conducted using actual scientists. In this chapter, we will find out why. We will also learn more about terms like 'mental model' and 'heuristic' that were used in Chapter 1. We will also try to show, by way of contrast, how science can be studied from other perspectives, particularly the wide range of view loosely identified as having to do with the sociology of scientific knowledge (SSK).
History, philosophy and sociology have strong sub-disciplines associated with the study of science. There is no equivalent in psychology--instead, there are scattered practitioners who do psychology research relevant to science, but very few take on a professional identity as psychologists of science. As a consequence, psychology's contributions to the study of science--and invention, as we will see in subsequent chapters--is not as great as sociology and philosophy because these disciplines have carved up the study of science in a way that appears to exclude cognitive psychology. It is worth providing a brief thumbnail sketch of how this state of affairs came to pass--call it a mythical reconstruction, in the best Campbellian sense, because any account like this is a great oversimplification of a much more complicated story.
Philosophy of science before Kuhn assumed the underlying logic of science is what is really important, not the mental processes of individual scientists. The information is out there in the world; all we have to do is see it, and while there may be interesting stories to tell about how perception works, physiologically, and why it and other psychological processes lead to occasional errors, basically, science is a rational progression towards truth. For example, Karl Popper saw little rationality in the way in which scientists arrived at their conjectures about the universe and its laws, but the way in which their ideas were subjected to testing by the scientific community had to be rational. According to Popper, science advances by discarding false hypotheses and replacing them with ones that are better approximations to the truth , a kind of asymptotic progression where science never arrives at a final, absolute truth but does come gradually closer and closer (Popper, 1959)(Popper, 1992). This distinction between discovery and justification allowed philosophers like Popper to bracket psychological processes and ignore them.
The historian of science Thomas Kuhn, in contrast, put the way a scientist represented the universe of possible choices at the center of his philosophy of science *(Kuhn, 1962). This emphasis on mental representations is also central to cognitive psychology, which in many ways is the study of how we represent the world.
According to Kuhn, he was working on a disseration in theoretical physics when he took a course on physics for the non-scientist that included a strong component of history of science.
To my complete surprise, that exposure to out-of-date scientific theory and practice radically undermined some of my basic conceptions about the nature of science and the reasons for its special success.
These conceptions were ones I had previously drawn partly from scientific training itself and parly from a long-standing vocational interest in the philosophy of science. Somehow, whatever their pedagogic utility and their abstract plausibility, those notions did not at all fit the enterprise that historical study displayed. Yet there were and are fundamentals to many discussions of science, and their failures of versimillitude therefore seemed thoroughly worth pursuing. The result was a drastic shift in my career plans, a shift from physics to history of science and then, gradually, from relatively straightforward historical problems back to the more philosophical concerns that had initially led me to history *(Kuhn, 1962, p. 5).
Kuhn had heard a hero's call to a journey that ended-up transforming science studies. The textbook accounts of science he read in graduate school and the rational reconstructions by philosophers of science did not square with historical accounts of how scientific discoveries actually occurred. Kuhn decided that most of the time, scientists do what he called normal science--they work within the framework of a paradigm, which suggests what kinds experiments are most promising and how to interpret the results. Philosophers have criticized Kuhn for the vagueness of this concept (Masterman, 1970), but its vagueness is its strength.
Kuhn felt scientists learned their paradigm through exemplars.
It was Kuhn's crucial insight that the fundamental units of scientific knowledge are not theories, nor even theories and associated observations, but solved problems. Problem solutions are the irreducible units of resource in scientific research: they are the models on the basis of which further problems are solved. Some of these problem solutions come to be recognized as holding a special promise for future research, and become accepted as authoritative bases for its practice in specific disciplines or specialties in science. These are exemplary achievements or paradigms. In them, theoretical discourse, practice and instrumentation are linked together and grasped in operation: they are understood in use in a way that they could not be understood by abstract consideration" (Barnes, Bloor, & Henry, 1996, pp.101-2).
To use the language of situated cognition, the kind of scientific knowledge represented by exemplars is embodied in devices and fine-grained experimental procedures which shape the sorts of experiments and observations scientists in a particular field conduct, and how they interpret them.
Those who give Kuhn's ideas a radical interpretation hold that the notion of a paradigm is consonant with the cultural beliefs and ritual practices of primitive tribes (Pinch, 1997). Scientists operating within a paradigm don't see it as a hypothesis, subject to test and change; the paradigm corresponds to the way the world is. Before Kepler, the planets moved in perfect circles. Before Lavoisier, burning produced phlogiston. Lavoisier not only created a new theoretical framework; he also provided exemplary experimental procedures. Faraday's portable electromagnetic motor is another exemplar. Eventually, scientists like these generate enough anomalous results to precipitate a paradigm shift., where an anomalous result is something that does not fit within the existing paradigm--like the orbital data for Mars generated by Tycho Brahe.
Kuhn used research in Gestalt psychology to explain what happens when a paradigm shift occurs. Gestalt psychologists thought that in perception, the whole was greater than the sum of its parts. If one changed a small element of a scene, it could suddenly change the way the whole scene was perceived. Kuhn cited an experiment in which the psychologists Bruner and Postman showed participants ordinary playing cards at brief exposures. This kind of brief exposure design is often used in perception experiments to test more automatic perceptual processes. But, as the Gestaltists often showed in their experiments, even brief, 'automatic' processes depend on expectations. In this experiment, some of the playing cards were anomalous, e.g., a black four of hearts. Participants took much longer to recognize these cards. Kuhn quoted one comment, "I can't make the suit out, whatever it is. It didn't even look like a card that time. I don't know what color it is now or whether it's a spade or heart. I'm not sure I even know what a spade looks like. My God!" *(Kuhn, 1962, pp. 63-4).
To Kuhn, this kind of dramatic shift in representation is at the core of scientific revolutions. Kepler certainly experienced it when he abandoned the universe of perfect circles. Empirical evidence is still central to Kuhn's view. Anomalous results have to pile up before a paradigm shift can occur. But there is still room for psychological explanations of why the anomalies trigger a representational crisis in a Kepler or an Einstein and not others, of how others then become convinced to abandon the old paradigm.
According to our mythically-oversimplified account, sociologists of science before Kuhn were more concerned with accounting for the kinds of norms that governed scientific conduct, and also for the way in which non-scientific political and ideological interests accounted for the errors that scientists made (Barnes, Bloor, & Henry, 1996). In other words, reality, was a sufficient explanation for why scientists eventually discovered aspects of the structure of the universe. Robert K. Merton, one of the fathers of sociology of science, argued that, "specific discoveries and inventions belong to the internal history of science and are largely independent of factors other than the purely scientific" (Merton, 1970).
Sociology might be used to explain why Kepler took so long to discard the perfect circle dogma and also why not all scientists immediately embraced his view. The Catholic church's resistance to heliocentric models would be such a factor. Here we have sociology as a way of explaining why the right answer, scientifically speaking, wasn't immediately obvious to everyone. This takes us back to BACON--if a computer can find Kepler's laws in a few minutes, and a group of college students in a few hours, why did it take civilization so long? Must be sociology.
This kind of sociology is certainly very important, but more recently, sociologists of science have tried to understand how scientific knowledge is created, regarding reality as an insufficient explanation. Kuhn drew an explicit analogy between scientific and political revolutions, thus opening the door to a consideration of how sociology shapes scientific knowledge. A new Sociology of Scientific Knowledge (SSK) gradually emerged as a kind of paradigm for studying science, at least among certain sociologists and anthropologists. I use the word 'paradigm' here loosely--SSK is a term which covers many different approaches, and there are sociologists who regard SSK as anathema. But from the standpoint of our almost mythically-oversimplified reconstruction, SSK made the radical assumption that the creation and dissemination of scientific knowledge were proper objects of sociological study. The corollary assumption was that scientific knowledge was at least in part constructed through social negotiations.
These ideas have been controversial, to say the least. To some critics, it sounded as if SSK were undermining the idea of scientist as discoverer of truth--instead, we might substitute scientist as skilled manipulator of social networks, with the end result having no more absolute validity than any other socially-accepted custom. Let us consider this issue in more detail.
When an anthropologist starts to question members of a culture about why they believe what they believe--why, for example, they believe that there are spirits in rivers or trees-- the 'natives' will be baffled, if not annoyed. "Because that's the way the world is", might be a typical (friendly) response. The typical anthropologist would not regard this statement as a sufficient explanation; she would look for more evidence as to why this system of beliefs was adopted.
If our anthropologist were then to enter a scientific laboratory and press a practitioner to explain why she believed in quarks, or genes, the answer might be: "Because they exist." One of the methodological principles of the new sociology of scientific knowledge (SSK) is symmetry: "the sociological explanation of beliefs in science should be pursued equivalently for both true and false beliefs" (Pinch, 1993, p. 363). In other words, appeals to reality are not sufficient to explain belief systems. True and false is an evaluation all cultures make, relative to their belief systems; an outside observer of a culture should not privilege these accounts.
Consider an example, which we will borrow from the new sociology of scientific knowledge (SSK) (Latour, 1986). Supposing we were studying the ritual practices and beliefs of a tribe called the Azande. Take a rain dance, for instance. The Azande shaman, or skilled practitioner of the dance, would be able to explain every success and failure and show that it did, indeed, achieve the desired effect, when done properly.
If we studied this Azande practice from an outside perspective, we could explain every success and every failure in terms alien to the Azande and ultimately show, to our satisfaction, that the dance had no effect on the rain. In contrast, if we adopted an Azande framework to study the Azande system of beliefs, we might come up with some interesting new interpretations that improved ritual practices, but we would be unlikely to decide that whole Azande system was worthless as a general method for improving human relationships with nature.
One can use powerful techniques like ethnography to study scientific laboratories--the same sorts of techniques that might be used to study a tribe in the Amazon. One should also take the same attitude studying these natives, keeping a kind of anthropological distance. One should not go into a study of the scientific laboratory or of the Azande village by assuming, at the outset, that the practices one observes do lead to the truth, if done properly. Nor should one assume the converse--that the sets of rituals one observes are simply primitive mumbo-jumbo which cannot possibly lead to a greater understanding of nature and the universe. In the course of study of any culture, one may uncover great truths.
To summarize our argument so far, when one studies scientific practice, one should keep a bit of anthropological distance and not take all the scientists' accounts of their own behavior at face-value. Nor should one ignore those accounts. There are two kinds of anthropological distance one must try to maintain:
1) Practitioner: Paul Gross and Norman Levitt, in a recent stinging attack on much of social studies of science, imply that in order to study science, one has to have professional training equivalent to a scientist: "We are saying, in effect, that a scholar devoted to a project of this kind must be, inter alia, a scientist of professional standing or nearly so" (Gross & Levitt, 1994, p.235). Certainly, deep knowledge of the subject matter is important, but being a scientists can assume adherence to a view of the world in which certain practices lead unquestionably to a kind of truth. Not all scientists hold such views (Wise, 1996), but those that do would have to be able to bracket their beliefs in order to study them critically. Charging such a scientist with studying her practice would be akin to asking the Azande shaman to evaluate her beliefs--it is a rare shaman or scientist that could attain this sort of distance. William Keith imagines the scientist responding to the sociologist or anthropologist: "We thought (science) was about seeking truth, while you think it's about social arrangements" (Keith, 1995, p. 321).
Let us try to make this difficult point clearer by considering Kuhn's views once again. During a period of crisis, or revolution, scientists in an area become aware that they are operating within a paradigm and that other views are possible. Kuhn argued that holders of an existing paradigm could not even understand a new point of view, because the two belief systems were incommensurable. An example is Barbara McClintock's discovery of genetic transposition *(Keller, 1983). She was working on corn in the mid-1940s at a time when most geneticists were working on drosophila, and she generated a set of anomalies by looking closely at the way in which mutations occurred. After six years of hard, relatively isolated study, she concluded that whole sections of the chromosome could be transposed to another location, and that this was a process that illustrated the normal functioning of the whole genetic system, not an odd or unusual event.
Her initial attempts to communicate this discovery failed almost totally; other geneticists literally did not know what she was talking about, both because what she described was theoretically at odds with the dominant paradigm, and because her methods and the organism she studied were unfamiliar to most. "Central to neo-Darwinian theory was the premise that whatever genetic variation does occur is random, and McClintock reported genetic changes that are under the control of the organism. Such results just did not fit in the standard frame of analysis.
"But it was not only the ideas themselves that were foreign, and hence difficult to grasp for most geneticists; the very kinds of evidence she presented, or rather the patterns it formed were also difficult to follow...Her knowledge of maize was more intimate and more thorough than that of anyone else in the audience" *(Keller, 1983, pp. 144-5). This is the kind of combined theoretical and methodological incommensurability that could occur at times of paradigm shift, according to Kuhn. McClintock's work was eventually recognized in the mind 1970s when similar conclusions emerged from work on bacteria *(Keller, 1983). This thirty-year hiatus is an example of how long a period of incomensurability can last, but also how it can eventually be overcome.
Kuhn later reduced the importance of incommensurabilities, but those who take a radical view of Kuhn have continued to emphasize it *(Pinch, 1997). If this radical view of Kuhn were right, then it would be very difficult to find a practicing scientist who could study her area of science. Consider a 'normal scientist' in genetics looking at what McClintock was doing in the 1950s. Joshua Lederburg concluded she was "either mad or a genius" after visiting her lab *(Keller, 1983, p. 142). To study science, one would need a scientist who could bracket what he 'knew' about the realities in his field.
2) Methodological: Kuhn argues that science does not lead to truths about the universe, but rather makes progress by solving puzzles. "I do not doubt, for example, that Newton's mechanics improves on Aristotle's and that Einstein's improves on Newton's as instruments for puzzle-solving. But I can see in their succession no coherent direction of ontological development. On the contrary, in some important respects, though by no means in all, Einstein's general theory of relativity is closer to Aristotle's that either of them is to Newton's" *(Kuhn, 1962, pp. 206-7).
An anthropologist or psychologist studying science should be careful not to rely on the puzzle-solving practices of a particular field of science to evaluate that area. We discussed this problem above; it is easily solved by noting that our anthropologist will use methods derived from her specialty, perhaps ethnographic techniques, and a psychologist will use tools appropriate to her discipline (see below). Gross and Levitt's professional scientist who studies science would have to receive special training in social sciences as well.
Kuhn's emphasis on puzzle-solving over ontological truth is also prone to a radical interpretation: that science does not lead to absolute truths. This point of view is often referred to as ideological relativism. It implies that the beliefs of the Azande are just as true as the claims of science--'truth' is a relative notion, and truths vary from culture to culture. Perhaps we can never escape our cultural assumptions, and science is simply an outgrowth of a particular culture's assumptions that the observer can be separated from the observed. There is no absolute, rational boundary between science and pseudo-science, therefore work in areas like ESP and astrology and even Creationism can fairly be labeled science by their proponents (Collins, 1982).
The four discoverers in the last chapter would have been astonished if an observer had argued that they were only contributing to a culturally-bound world-view, even if they were given credit for helping to solve puzzles. Rightly or wrongly, they believed they were after eternal truths--just as the Azande see their beliefs as truths.
To Richard Feynman, the real distinguishing characteristic of science resemles Popper's falsification--a willingness to criticize one's beliefs. In a commencement address at Cal Tech in 1974, he talked about 'cargo cult science', after an unnamed group of South Sea Islanders who wanted the planes that had come full of cargo during World War II to return. So they built something akin to a runway, put fires along its sides, made a wooden hut and put a man in it with wooden pieces on his ears and bamboo bars sticking out from them like antenna. The planes didn't land, of course. Feynman (1974, http://www.astro.washington.edu/ingram/edu/a101.sp94.cargocult) argued that the central scientific idea is missing in cargo cult sciences:
It's a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty--a kind of leaning over backwards. For example, if you're doing an experiment, you should report everything that you think might make it invalid--not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked--to make sure the other fellow can tell they have been eliminated.
Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can--if you know anything at all wrong, or possibly wrong--to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition.
In summary, the idea is to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgment in one particular direction or another.
Feynman is sketching a kind of Campbellian hero who seeks the grail of truth without regard for mortal consequences. Feynman himself played a critical role in the development of the atomic bomb at Los Alamos, which does not contradict Feyman's emphasis on scientific integrity but does demonstrate that a scientist's choice of a problem may be determined by institutions. The point a sociologist might make is that these organizations do not lie outside of science--indeed, the Manhattan Project was created by scientists like Einstein and Szilard. (see Chapter 4)
Furthermore, Feynman's view of the self-critical scientist stands in apparent contrast to Latour's observation that science advances by creating 'black boxes' (Latour, 1987): procedures or devices or equations or facts that are taken for granted by future generations of scientists and may be virtually incomprehensible to outsiders. Black boxes are an effort to close off the kind of constant questioning Feynman is calling for. Latour thinks of science studies as an effort to open these black boxes, particularly for a wider public.
The physicist Alan Sokal (http://www.nyu.edu/gsas/dept/physics/faculty/ sokal/index.html) designed an experiment to test whether science's critics, many of them relativists, were cargo cultists or serious scholars. He submitted an article "Transgressing the Boundaries: Towards a Transformative Herneneutics of Quantum Gravity" for the special "Science Wars" issue of the postmodernist journal, Social Text (Chapman, June 6, 1996). The article was a deliberate parody of the kind of language used by postmodern scholars, quoting extensively from philosophers like Derrida. Sokal's hypothesis was that, despite the fact that he deliberately made statements about physics that were wrong, the editors would accept it, because it accorded with their preconceptions. In other words, he bet that the editors would exhibit confirmation bias. Studies of the peer-review process suggest that reviewers are often biased towards results that agree with the dominant paradigm in their area (Cicchetti, 1991; Mahoney, 1977).
The editors of the journal accepted his manuscript and published it. When Sokal revealed the hoax, they apologized to their readers, but defended themselves on the grounds that Sokal's deception was itself ethically questionable--a journal editor assumes that an author is being honest (see Robbins and Ross, http://www.nyu.edu/pubs/socialtext/ sokal.html).
This reminds me of the classic study in which graduate students pretended to be mental patients in order to gain admission to mental hospitals (Rosenhan, 1973). All were admitted and diagnosed, a fact which was used to critique the whole mental health system. But the hospitals defended themselves on the grounds that no sane person would ask to be admitted to a mental hospital! One wonders what the response would have been had a science studies scholar constructed a fake physics article, complete with a set of plausible references and fabricated data that supported the current paradigm in a domain. Chances are, the article would be accepted, if the referees were blind to the author's name and discipline. If so, the fictitious article would simply be dismissed as a fraud and not be seen as undermining the legitimacy of physics.
Deception aside, Sokal's piece was deliberately constructed to sound like nonsense. If he had submitted it to a refereed journal like Social Studies of Science, it would certainly have been rejected. Therefore, it is wrong to tar science studies with the mistakes made by the editors of Social Text..
Strong relativism of the sort advocated by a minority of science-studies scholars has produced an enraged response from some scientists and mathematicians (Gross & Levitt, 1994) and a more sympathetic critique from others (Labinger, 1995). Gross and Levitt in essence revive C.P. Snow's old two cultures argument (Snow, 1963) and report that the rift between the scientific and humanistic cultures in the academy is widening, with the humanists now claiming to have a unique perspective from which to view science, one that scientists like Sokal find bizarre and incomprehensible. Gross and Levitt argue that a realist perspective is essential to doing science:
Science is, above all else, a reality-driven enterprise. Every active investigator is inescapably aware of this. It creates the pain as well as much of the delight of research. Reality is the overseer at one's shoulder, ready to rap one's knuckles or to spring the trap into which one has been led by overconfidence, or by a too-complacent reliance o mere surmise. Science succeeds precisely because it has accepted a bargain in which even the boldest imagination stands hostage to reality (Gross & Levitt, 1994, p. 234).
From this perspective, even the adoption of methodological relativism would make it impossible to understand science, and strong relativism would simply be nonsense. At a recent meeting of the Society for Social Studies of Science, Donna Haraway (Haraway, October 20, 1995) put her finger on the central question--are most of the philosophical differences between scientists like Feynman and Gross and those relativists who study them the result of mutual incomprehensibility or deep-seated issues that cannot be resolved? Are the two sides condemned to talk past one another forever?
There is room for compromise on the realist/relativist debate. One approach is to take an agnostic view, to bracket the question of whether the entities and relations discovered by scientists really existed before they were brought to light:
...what does indeed come into existence, within, usually, a longer term process, when science 'discovers' a microbe or a subatomic particle, is a specific entity distinguished from other entities (other microbes, other particles) and furnished with a name, a set of descriptors, and a set of techniques in which it can be produced and handled. In other words, some part of a preexisting material world becomes specified and thereby real as something to be reckoned with, accounted for, and inserted in manifold ways into scientific and everyday life. This does not preclude the possibility that some physical correlate of this entity existed, unidentified, tangled up with other objects, before scientists turned their attention to this object" (Knorr-Cetina, 1995, p.161).
This agnostic position vis-à-vis realism works well with methodological relativism; it allows the scholar studying science to bracket the whole question of which objects or entities are real and study how scientists convince themselves that they are real.
The physicist Jay Labinger saw methodological relativism as a "perfectly sound scientific practice" if its intent were to isolate the role of social negotiations and other cultural phenomena by bracketing the effects of reality. In this case, of course, one would have to keep in mind "that the subject of study is now an approximate model, and that the excluded factors may well turn out to be at least as important as the ones being examined" *(Labinger, 1995, p. 291). Labinger calls for collaboration among scientists and those studying science. This is a promising solution to the problem of expertise: how can one acquire both sufficient knowledge in a science and also in history, philosophy, anthropology, sociology or psychology? Myers' work on writing in biology is an example of such a collaboration (Myers, 1990; Myers, 1995).
An alternate solution to the realist/relativist controversy was suggested by Donald Campbell, who pointed-out that one can be both a realist and a sociologist of scientific knowledge by taking the view that reality plays only a small role in settling scientific debates. Indeed, I would go farther and argue that the role of reality varies among scientific controversies--some may be resolved easily by negotiations among participants, others may include hard, inescapable facts that resist efforts at premature closure. If one drops the ideological posturing, one can conduct empirical studies to determine in what sorts of situations nature resists and facilitates negotiations.
C.P. Snow had little to say about social sciences, which exist between sciences and humanities and could theoretically bridge the two cultures gap. Gross and Levitt have a great deal to say about cultural anthropology and the sociology of scientific knowledge, particularly in their more radical forms, but little to say about disciplines like cognitive psychology of science that do not adopt the strong relativist position.
In this section, we will look at what psychologists and cognitive scientists have to contribute to the study of scientific thinking. From the standpoint of traditional philosopher of science, the underlying logic of science is what is really important, not the mental processes of individual sciences. There may be interesting psychological stories to tell about how discoveries are made, but these are not relevant to how they are justified by the scientific community (Siegel, 1980). There are notable exceptions to this view. For example, philosophers like Steve Fuller and Ron Giere put psychology at the center of science studies in very different ways, the former linked more to Skinner's behaviorism and the latter to cognitive psychology (Fuller, 1989; Giere, 1988).
From the standpoint of at least some sociologists, "thinking is not something that happens inside heads or brains" (Restivo, October, 1995). The way to study scientists is to look at their interactions with each other and the inscriptions that they produce *(Latour, 1987).
If science is either largely the product of an underlying logic or of social negotiations, then psychology is marginal--a way, perhaps, of accounting for aspects of discovery and knowledge transmission having to do with perceptual and physiological processes. Psychologists contributed to this marginalization, I think, by their reluctance to study the methods they were using to justify their existence. Most psychologists are very concerned with making their field more scientific. Kuhn thought textbooks revealed the image of science: "the aim of such books is persuasive and pedagogic; a concept of science drawn from them is no more likely to fit the enterprise that produced them than an image of a national culture drawn from a tourist brochure or a language text" *(Kuhn, 1962, p. 1).
Almost every introductory psychology textbook I have seen opens with a statement about how psychology is a science. For example, the textbook I used when I taught introductory psychology states that "once psychologists have developed a theory, they proceed in much the same general ways, regardless of the exact content of the theory. They subject the theory to empirical tests. They predict in advance what sorts of observable actions should occur when certain variables are changed. By testing to see whether their predictions describe the actual research outcomes, they find out whether their laboriously constructed theories are correct" (Darley, Glucksberg, & Kinchla, 1988, p. 8). To a philosopher or sociologist, the statements about science would sound naive. Perhaps psychologists do not want to question the methods they are trying so hard to adopt. If so, they are following the pattern of textbooks in the 'harder' sciences. .
Cognitive psychologists want to use the methods of science to study science. Therefore, cognitive scientists a faced with a paradox--can science be used to study science? This harks back to the point we made earlier about not relying solely on using Azande methods to study Azande beliefs. If by science we mean a method guaranteed to find orderly relationships in the natural world, and if the object of studying science was to verify that this Method did achieve its goal, then we would be on shaky ground.
But perhaps we are really talking about methods, where the small 'm' denotes the fact that different scientific disciplines have different practices. Making generalizations about discovery by looking closely at what scientists actually do potentially gets us out of the problem of using a Method to verify itself (Barker, 1989). The methods of psychology are not the same as those of physics, or sociology--and within psychology, there are differences in methods among cognitive, social and personality psychologists. For example, all may use experiments from time-to-time, but different specialties design different sorts of experiments. (We will encounter examples of these different types of experiments later in this chapter.)
To make this question more specific, can one use the experimental method to study the experimental method? Yes, if the kinds of experiments reflect the unique practices of a specialty different from the one being studied, and also if the experiments are triangulated with other methods like fine-grained case-studies--particularly if these methods are borrowed from other disciplines like anthropology and sociology. (Again, we will encounter examples of this triangulation later in this chapter.)
Most of the participants in the realist/relativist debate seem to assume cognitive studies will reproduce a kind of Feynman view of the scientist as rational seeker after objective truth. But in fact, one can study the cognitive processes of the Azande as easily as those of scientists.
For example, Edwin Hutchins has done a superb study of the cognitive processes involved in traditional Micronesian navigation (Hutchins, 1983). A small group of expert navigators from the Central Caroline Islands routinely embark on ocean voyages of several days out of sight of land; they belong to a pre-literate culture and use none of the Western technology of navigation, not even a compass. Their navigation begins from the assumption that the boat is stationary and islands gradually move by it; the passage of reference islands is marked by the position of the stars. In most cases, these reference islands are imaginary constructs. The Micronesian navigator has a very different mental model from his modern Western counterparts, but one that is no less amenable to cognitive analysis.
Cognition is usually seen as an individual activity, too, but as Hutchins has shown, modern Western navigation is a team activity that is still amenable to cognitive analysis (Hutchins, 1995). There is no clear demarcation between the cognitive and the social: one merges into the other. But in this book, we will begin from the cognitive end.
From a sociological standpoint, 'beginning from the cognitive end' is equivalent to saying we will apply the tools of one discipline or community (cognitive psychology) to studying and understanding the practices of others (sciences). The tools, of course, carry a framework with them, and therefore it is important for those applying the tools to be aware of the assumptions that 'come along for the ride'. Indeed, the tools, methods and assumptions ought to be applied to the studiers as well as those studied.
This is the 'reflexive turn' in the sociology of scientific knowledge--an effort to say the equivalent of "sociologist (or cognitive scientist) study thyself." A further implication is that there is no privileged viewpoint, no solid epistemological ground, from which one can study--the philosopher, the psychologist, the sociologist and the scientists themselves simply speak with different voices. As Latour has cried in mock-horror,
But where can we find the concepts, the words, the tools that will make our explanation independent of the science under study? I must admit that there is no established stock of such concepts, especially not in the so-called human sciences, particularly sociology. Invented at the same period and by the same people as scientism, sociology is powerless to understand the skills from which it has so long been separated. Of the sociology of the sciences I can therefore say, "Protect me from my friends, I shall deal with my enemies," for if we set out to explain the sciences, it may well be that the social sciences will suffer first (quoted in Lynch, 1992, p. 230).
One must be reflexively aware of the black boxes created by those studying science. In an earlier book, I tried to open the black-boxes created by cognitive scientists doing experimental studies of scientific reasoning (Gorman, 1992a). This chapter will review some of that work, but will go substantially beyond it, to consider recent developments that move cognitive science in the direction of focusing more on practice and worrying less about Method. Where possible, I will use adopt the 'meta-alternation' strategy advocated by Collins and Yearley , whose answer to the problem of reflexivity is to advocate alternating between perspectives (Collins & Yearley, 1992) Therefore, I will use aspects of the sociology of scientific knowledge to provide an occasional alternative to the cognitive account
What we will find is that there is not one cognitive psychology community which applies its tools to science--there are disparate groups, with different definitions of what constitutes the appropriate focus of study. There is an even wider group of practitioners who call themselves cognitive scientists. Cognitive psychology is one of the areas that is usually subsumed under this interdisciplinary label; other practitioners include computer scientists interested in artificial intelligence and machine learning, philosophers of mind sympathetic to computational approaches, neuroscientist interested in functions like memory and cognitive anthropologists like Edwin Hutchins (Gardner, 1985). Cognitive scientists are often reluctant to study science for the very reason we cited at the beginning of this chapter--many cognitive scientists want to be viewed as following the Scientific Method, and any attempt to apply their tools to studying science makes it sound like they are questioning the basis of their own beliefs.
One of the few orthodoxies adhered to by most cognitive scientists is the belief that a hypothesis ought to be expressible in computational form (Baars, 1986; Johnson-Laird, 1988). But this again raises the problem of using a method to validate itself--if all cognitive hypotheses have to be in computational form, then any aspects of scientific practice that could not be described by current computational techniques would be ignored. Similarly, the Azande might insist that all hypotheses about the weather be put in their cultural framework. What would have happened if Faraday had been told all his hypotheses about electricity and magnetism had to be put into equations? Faraday worked in a rigorous, geometric fashion, but he did not reduce his lines of force to equations--that was left to Maxwell.
There is a further complication to the computational work, one that illustrates the way in which cognitive science and cognitive psychology can diverge. Most of the computational simulations have dual goals--to model human problem-solving (cognitive psychology) and also to explore how and whether machines can discover (cognitive science). The traditional goal of Artificial Intelligence (AI) has been to understand human intelligence by building machines that mimic human mental processes. In contrast, the goal of expert system design is to create systems that will be as good as, or better than, human experts.
At first blush, it might seem that the two goals are the same. But in many domains, computers assist experts by doing what human beings cannot--calculating at speeds far in excess of the human nervous system, storing far more information in a literal form than any human brain could, providing sophisticated real-time visualizations, etc. The word 'computer' originally referred to a human being especially trained to do high-speed calculations. So computers are already taking over areas of human expertise, but in most of these, they are not functioning like humans. Consider, for example, the best chess programs, which work by brute force--literally calculating most of the possible combinations, whereas the human opponent relies on heuristics to narrow the search space.
This book is about understanding human discovery, therefore, the AI literature is more relevant than the literature on machine-learning., though the boundary between the two is fuzzy. In particular, many expert systesm using brute-force heuristics can become important aids in discovery. Already, most chess players own a computer which they practie with and which they may even be allowed to bring to tournaments. For those who take the veiw that cognition is often shared across a network of tools and actors, the computer is part of the process we refer to as mind *(Gorman, 1997).
The examples in the last chapter illustrate the diversity of computational approaches to discovery: BACON and KEKADA emulated the sorts of heuristics used in scientific discovery, CLARITY allowed the user to explore different discovery paths and Thagard's connectionist system tried to show how scientific controversies could be resolved by explanatory coherence. There are quite a few other approaches as well (Cheng, 1992; Shrager, 1990).
This type of simulation fits the goal of applying the tools of cognitive science to the practice of science. The various computational techniques are tools used by individuals or groups that label themselves cognitive scientists, and in the last chapter, we saw examples of their application to actual cases of discovery.
We also noted that each technique carried an assumptive framework with it. KEKADA makes Kreb look like an exemplar of Herbert Simon's views regarding cognition, CLARITY supports Gooding's framework and Thagard's ECHO supports his philosophical notions.
We also noted that each technique carried an assumptive framework with it. KEKADA makes Kreb look like an exemplar of Herbert Simon's views regarding cognition, CLARITY supports Gooding's framework and Thagard's ECHO supports his philosophical notions. This does not mean these simulations are not capable of surprising their creators, just that the range of possible surprises is limited by the structure of the simulations.
We are back at the issue of using a method to validate itself. We cannot use KEKADA to validate a heuristic-based approach to an understanding of discovery because the heuristic approach is embedded in KEKADA. For example, KEKADA cannot decisively refute those who argue that heuristics are just post-hoc rationalizations that are used to explain what experts appear to do, but do not reflect the way they actually solve problems (Suchman, 1987). But we could use other methods to complement simulations like KEKADA, like detailed case-studies (Kulkarni, 1988) and experimental simulations (Qin, 1990). We could also create a counterfactual computational simulation of Krebs' process, perhaps based on a connectionist algorithm or on the kind of visual programming used in CLARITY, one that was not heuristic-based. If such a simulation compared well with an actual case-study, it would show that alternatives to heuristic-based models can give at least as good an account of scientific discovery. Unfortunately, multiple computational approaches have not, as far as I know, been applied to a single case--instead, each approach tackles its own problem.
A decisive and sufficient refutation of the 'strong programme' in the sociology of scientific knowledge (SSK) would be the demonstration of a case in which scientific discovery is totally isolated from all social or cultural factors whatever. I want to discuss examples where precisely this circumstance prevails concerning the discovery of fundamental laws of the first importance in science. The work I will describe involves computer programs being developed in the burgeoning interdisciplinary field of cognitive science, and specifically within 'artificial intelligence' (AI). The claim I wish to advance is that these programs constitute a 'pure' or socially uncontaminated instance of inductive inference, and are capable of autonomously deriving classical scientific laws from the raw observational data (Slezak, 1989).
Simply put, Slezak's argued that, if programs like BACON can discover, then there is no need to invoke all these interests and negotiations the sociologists use to explain discovery. His claims sparked a vigorous debate in the journal Social Studies of Science (see the November, 1989 issue).
In contrast, Shrager and Langley argue that, "two important aspects of intellectual activity--embedding and embodiment--that have significant on science...have not been addressed by exiting computational models. Briefly, science takes place in a world that is occupied by the scientist, by the physical system under study, and by other agents, and this world has indefinite richness of physical structure and constraint. Thus the scientist is an embodied agent embedded in a physical and social world" (Shrager, 1990, p. 224).
The embodiment issue is perhaps most clearly illustrated by the powerful KEKADA simulation, which still could not emulate the tissue-slicing procedures that were critical to Krebs' discovery. Robotic systems and neural nets that do pattern recognition may someday be able to simulate more of the embodied character of scientific procedures.
One important aspect of embodiment is visualization, and here computational simulations are making interesting progress. Cheng and Simon showed that it might have been easier for Huygens and Wren to have discovered the law of conservation of momentum using diagrams rather than deriving it from theory or by data-driven processes similar to those used by BACON (Cheng & Simon, 1992). Cheng then created HUYGENS, a more general computational simulation of discovery by one-dimensional diagrams. He noted that:
HUYGENS provides further computational evidence for the view that switching back and forth between representations is an effective way to enhance creativity. From given numerical data, HUYGENS switches to a space of diagrams in its search for regularities by looking for patterns in the diagrams. When patterns have been found, the regularities are simply transformed back into equations. The change to diagrammatic representation permits different operators, regularity spotters and heuristics to be employed that are more effective than those used in the direct search of a space of algebraic terms (Cheng & Simon, 1995, p. 224).
Cheng admits that we cannot be sure the real Huygens used this method--but it is plausible, historically, and HUYGENS demonstrates that it would have been more efficient than alternatives. Instead of claiming he developed a program that discovers, Cheng argued instead that he had provided computational evidence for the importance of using diagrams in scientific discovery, evidence that could be combined with material from other sources, e.g., fine-grained case-studies of the way diagrams are used in actual discoveries. Cheng's work is still a long way towards being embodied, but it is a step in the right direction.
Another computational approach that has potential for addressing the problem of embodiment is the use of neural networks designed to simulate aspects of the human nervous system. While these networks are particularly valuable for modeling the sorts of sensory processes involved in recognizing and manipulating objects, they may also be able to provide insights into the kinds of connections among neurons that would promote creativity (Martindale, 1995).
But neither Cheng's diagrams nor neural networks can yet simulate the interplay between instrument, hand and eye--and the way in which the scientist is also an inventor. Indeed, one could argue that the computer is itself part of the process of embodiment--it is one of the tools modified by the scientist that facilitates discovery.
Shrager and Langley also make an important point about the fact that computational simulations fail to capture the way scientists are embedded in social networks. Ironically, all of these computer simulations are themselves embedded in a rich network of human negotiations. It is the humans who seek funding for them, supply them with their data and make the claim that they discover. That is why Slezak is wrong about discovery programs refuting the sociology of scientific knowledge: the programs are themselves embedded in the processes they are supposed to refute! Brannigan argued that Azande computer scientists could "write a program which, given the selective identification of the data observed by Azande experts, would rediscover witchcraft as a cause of illness" (Brannigan, 1989, p. 610). Such a program would teach us a great deal about the heuristics used by the Azande equivalent of witches, but would not establish that witchcraft constituted a 'socially uncontaminated' method of inference.
Computer simulations do not replace a sociology of discovery. They complement fine-grained studies of the discovery process, allowing us to model individual cognitive processes but also potentially aspects of the social negotiations involved. This is reflected in the movement towards case-based simulations of reasoning that reflect the kinds of embedded learning that occur through apprenticeship, and could allow better computational models of human creativity (Schank & Cleary, 1995). We will have more to say about case-based reasoning when we discuss situated cognition at the end of the chapter.
If intelligent machines do emerge in the future, they will form part of the scientists' network, assisting in those areas where humans are weak--high-speed computation, statistical corrections to diagnostic reasoning (Faust, 1984), three-dimensional simulation of processes that are difficult to visualize, etc. Future sociologists and psychologists will need to study how these intelligent programs fit into the discovery process. A good example is Feigenbaum's Dendral expert systems, designed to infer candidate molecular structures from spectral data. Although Dendral was successful at its task, it was not adopted by working scientists; instead, its algorithms were transferred to databases that are currently used widely. As Dendral's creators commented, "As AI researchers we seriously underestimated the problems of technology transfer and the nature of the barriers to diffusion. 'Underestimate' is charitable: we really didn't have the foggiest idea." (Feigenbaum & Buchanan, 1993, p. 238) Further research is needed on how human and computer experts work together to make discoveries.
Kevin Dunbar (Dunbar, 1995) used a biological analogy to classify studies of scientific thinking. In vitro studies are experiments on scientific thinking, analogous to biology laboratory experiments. In vivo studies are case-studies of scientists and science students in their working environments, analogous to studies of biological organisms in their natural environments. Dunbar does not provide us a biological analogy for computational simulations, because such simulations can be used by biologists to model what goes on in both in vivo and in vitro studies: one can use laboratory or field data to test or create computational models of biological processes.
There are essentially three types of tasks used by cognitive psychologists in their in vitro studies of scientific thinking:
1) Abstract problems which model aspects of scientific reasoning.
2) Tasks that are designed to simulate actual scientific problems.
3) Actual scientific problems.
Examples of these problems will be found below.
Another distinction used by cognitive psychologists has to do with whether a study uses expert or novice participants, or both. An expert, in this case, would be an actual scientist. The novice category is mostly made up of college students of a variety of backgrounds, some of whom may have taken a few science courses but none of whom are practitioners. Most psychology experiments use undergraduates--often ones taking a psychology course. But note that there may be important differences in expertise within this ambiguous 'novice' category: occasional studies have referred to graduate students in a scientific field as novices when compared to expert scientists, so I lump all students in the novice category.
This special category of 'novice' is worth mentioning. Children have been used as participants in a number of studies of scientific reasoning, on the grounds that the kinds of conceptual changes they go through replicate the sorts of changes that occur in scientific revolutions. We will include a few of these studies in the novice category in the table.
This way of organizing the cognitive psychology of science is summarized in the following table:
Type of task:
Participants: |
Abstract |
Simulated Scientific Problem | Scientific Problem |
Novices |
|||
Children |
|||
Experts |
Note that an expert scientist could be used as a participant in an in vitro study, and a novice could be used in an in vivo study--say, a field study of how children or college students learn scientific concepts. We will employ this table iteratively, using it to summarize findings in each of the cells in which there has been significant research.
Imagine you are a participant in a psychology experiment. You are told that the three numbers '2,4,6' are an instance of a rule the psychologist has in mind. You are to solve the rule by proposing additional number triples; the psychologist will tell you whether each corresponds to the rule or not.
If you are like most participants, you will begin with numbers like '6,8,10'. When the psychologist says, "That's correct," you will continue the pattern, perhaps proposing '10,12,14'. You might at that point stop and ask if the rule were 'even numbers ascending by twos'.
This particular task, created by Peter Wason (Wason, 1960), has often been used to study scientific reasoning (Gorman, 1992a). At first blush, this seems absurd--what can a three number problem have in common with the kinds of cases of discovery described in the last chapter? One could, however, view each of the number triples proposed by the participant as a kind of experiment, directed at finding an underlying law. 'Even numbers ascending by twos' is a hypothesis discovered by a participant, based both on previous evidence--the triple '2,4,6'--and on experimental triples proposed by the participant.
Note the resemblance between this task and the kind of data-driven discovery performed by BACON. Both participants in the 2,4,6 task and BACON are trying to find the rule that governs a set of numbers--except that BACON is given data to look at and the participant in the experiment has to generate it. The typical participant is a college student--who may or may not have a scientific or technical background.
Initially, experiments with the 2-4-6 task, as this number triple problem is called, were intended to investigate whether people could use a particular hypothesis-testing strategy favored by the philosopher-of-science Karl Popper (Popper, 1959). Popper emphasized that science progresses best if scientists propose bold, risky hypotheses that can potentially be falsified (Popper, 1963). Popper was not particularly interested in how scientists came up with the hypotheses; he focused more on the way in which the hypotheses were tested.
In other words, Popper supported the classic distinction philosophers make between discovery and justification (Reichenbach, 1938), which simply says that the way in which a hypothesis is discovered should have no effect on how it is evaluated. A scientist could have a dream, like Kekule is supposed to have done, and discover the structure of the benzene molecule. At the twenty-fifth anniversary of the publication of his discovery, Kekule gave the following account if it:
One fine summer evening, I was returning by the last omnibus. I fell into a reverie and lo, the atoms were gambolling before my eyes! Whenever hitherto these diminutive beings had appeared to me, they had always been in motion; but up to that time I had never been able to discern the nature of their motion. Now, however, I saw how, frequently, two smaller atoms united to form a pair; how a larger one embraced two smaller ones; how still larger ones kept hold of three or even four of the smaller; whilst the whole kept swirling in a giddy dance. I saw how the larger ones formed a chain, dragging the smaller ones after them, but only at the ends of the chain...The cry of the conductor: "Clapham Road," awakened me from my dreaming; but I spent part of the night in putting on paper at least sketches of these dream forms. This was the origin of structure theory (Schaffer, 1994, p. 23).
Here the voice of the muse comes to the hero who is prepared to listen, and he carries its words back to the world. This famous account is retrospective, and therefore may not be entirely accurate (Ericsson & Simon, 1984). Kekule also provided other accounts of his discovery, including one involving circling snakes that suggested the way the atoms might be linked. Kekule no doubt saw part of the solution in dreams and reveries, but the working out of the rest was time-consuming, difficult and involved frequent negotiations with others; these stories helped establish Kekule's priority and originality. Indeed, Kekule's address resulted, in part, from a deliberate effort by the organizers of the conference to establish him as a scientific hero (Schaffer, 1994). All of this is not to say that Kekule was lying. Human memory for complex events is reconstructive, and tends to reflect what we thing ought to have happened, not what actually happened (Neisser, 1982).
Popper would not have cared about this story and the negotiations in which it was embedded. What mattered was whether Kekule formulated a falsifiable hypothesis. One could argue with Popper that stories of this sort play a role in determining who gets credit for a discovery, but he still would not be interested. His concern was how theories ought to be justified.
Popper's favorite example of a falsifiable hypothesis was Einstein's General Theory of Relativity, which included a specific prediction about the curvature of light in a gravitational field. Eddington set out to test this prediction by measuring to what extent light from selected stars was attracted by the sun's gravitational field during an eclipse. When Einstein was asked what he would do if Eddington's results did not agree with his predictions, he said, "Then I would feel sorry for the dear Lord (Eddington). The theory is right" (Holton, 1973, pp. 234-5). Similarly, when initial experiment results appeared to contradict Special Relativity, Einstein was not alarmed--he pointed out that the rivals to Special Relativity were ad-hoc theories and called for more replication. So, Einstein himself was not a Popperian. Eddington characterized his eclipse results as providing support for Einstein's General Relativity, but there were contradictions and ambiguities in the data (Collins & Pinch, 1993).
How could one determine if falsification would lead to more scientific progress? One could adopt an in vivo approach, looking at instances where falsification was deliberately applied to scientific problems. The problem with this kind of study is that any number of other factors could have affected progress, or the lack thereof, in an actual case.
An alternative is to look at whether falsification was an effective strategy on tasks that simulate scientific reasoning. If it were not an effective strategy in vitro, under ideal conditions, that would cast doubt on the usefulness of the strategy in general (Gorman, 1992b). Why? Consider this simple 2-4-6 task. It is exactly the sort of problem on which falsification should be effective--it eliminates all the confounding factors like error in the data and pressures to publish that may interfere with falsification in scientific practice *(see Gorman, 1992). Abstract tasks create ideal, simple situations for exploring the heuristic value of the sorts of norms recommended by philosophers. Of course, just because a norm like falsification works in an ideal, abstract situation, there is no guarantee it will work in science. Conversely, if a norm fails to work even under the most ideal conditions, there are good reasons for doubting its effectiveness in a real-world situation.
Wason (Wason, 1960) initially found that participants did not falsify hypotheses like 'even numbers ascending by twos'--they proposed instances that agreed with that hypotheses, and no triples that should have been wrong if the hypothesis were right. A participant who proposed '1,2,3' would have been told it was correct. If '1,2,3' is an instance of the rule, the hypothesis 'even numbers ascending by twos' is false. (In fact, Wason's rule was 'ascending numbers'). Wason viewed this as evidence of a 'verification bias' on the part of his participants.
One advantage of in vitro studies is control. In Wason's case, he was able to create a task that controlled for participants' previous experiences--none had ever worked on this problem before--and set-up the problem in a way that required participants to falsify the obvious, initial pattern if they were to discover the actual rule. This set-up factor illustrates the other powerful advantage of in vitro studies; they allow one to manipulate the conditions under which participants try to solve problems.
A group of psychologists at Bowling Green State University took this idea of manipulation a step further. They gave some participants instructions to falsify on the 2,4,6 task, and others instructions to try to verify or confirm their hypotheses (Tweney, 1980). Participants were asked to indicate which of their triples were intended as confirmations or disconfirmations; sure enough, participants given the disconfirmatory instructions did make more attempts to falsify. But their efforts to falsify did not make their performance better than those participants who tried to confirm. Worse, this lack of effect for falsification was a replication of an earlier study at Bowling Green, in which participants shot particles at objects on a computer screen in order to determine what rules govern particle deflection (Mynatt, Doherty, & Tweney, 1977; Mynatt, Doherty, & Tweney, 1978).
This seemed like a surprising result to me. If falsification played an important role in scientific progress, it seemed to me that it ought to improve performance on an in vitro simulation of scientific reasoning. I decided to follow-up on this puzzling finding and tried my own version of such instructions on the 2-4-6 task and a related problem. I unwittingly made an important change in the original design. Whenever a participant made a guess about the rule, experimenters in previous studies had told them whether that guess was right or wrong. That amounted to a scientist's being able to ask God whether her rule was right. No need to falsify if you can find out in some other way.
So, participants in my experiment had to test their own hypotheses, and whenever they asked if they could guess the rule, I told them it was up to them to decide whether and when they knew they had solved the problem. Under these circumstances, instructions to falsify greatly improved subjects' ability to solve the 'ascending numbers' rule.
I appeared to be onto a minor discovery myself--that falsification was effective on at least two problems that simulated scientific reasoning (Gorman, 1992a). My disconfirmatory instructions emphasized trying triples that ought to be wrong if one's hypothesis were right. What I was doing was trying to teach participants a heuristic I thought would lead to falsification. A heuristic is a kind of 'rule of thumb' of the sort that experts use. Heuristics, unlike algorithms, do not guarantee results. Experts use them in situations where there is no algorithm.
An analysis by Klayman and Ha (Klayman, 1987) showed why my heuristic should have been successful, and why it wasn't the same as falsification. These two authors referred to strategies like my 'try to get triples wrong as 'negative test heuristics'. When the participant's hypothesis is contained within the target rule, this sort of heuristic is most likely to lead to success.

Figure 5: H is the participant's hypothesis, T is the target rule. A negative test heuristic will focus the participant on the zone within T but outside of H.
Given that the rule was 'ascending numbers' and the initial triple suggested a hypothesis that was a sub-set of the actual rule, my 'try to get triples wrong' heuristic would be helpful in finding the actual rule--it would point participants to the outer 'T' ring in the above diagram.
However, if the problem space described by the hypothesis were broader than the target rule, a negative test heuristic would not be effective. Suppose one's hypothesis was 'numbers ascend by twos' and the actual rule was 'even numbers ascend by twos'. If one tried negative tests like "1,2,3' and '7,6,13", they would all be wrong, thereby confirming one's hypothesis.

Figure 6: In this case, participants need to propose positive instances of H like '3,5,7' in order to find T.
The only way to disconfirm it would be to try a positive test like '3,5,7', which would put one in the part of H that is outside T. My confirmatory instructions urged participants to propose triples they thought would be correct. But in some situations, these instructions would be more likely to lead to falsification, because when the rule is narrower than the hypothesis, some of the triples one thinks should be correct will be incorrect. My confirmatory instructions were really encouraging what Klayman and Ha called a positive test heuristic, which they regard as a good, all-purpose strategy for achieving either confirmation or disconfirmation.
In summary, my experiments did not show that falsification works as a general strategy across a wide range of problems. I had only found evidence to support the idea that a negative test heuristic will falsify a hypothesis that is narrower than a target rule.
I falsified even this analysis in my next experiment. On an even more general rule, 'the three numbers must be different', I found that negative test instructions did not improve performance--participants were clearly trying to obtain negative evidence, but they did not know where to find it. Following Tweney et al. (1980), I changed the task from a search for a single rule which would determine which triples were right and wrong to a search for two rules arbitrarily labeled DAX and MED: the DAX rule was "the three numbers must be different" and the MED rule was "two or more numbers must be the same". As in Tweney et al.'s earlier study, this manipulation greatly improved performance, whereas simply giving subjects instructions to falsify did not.
I concluded that falsification depended at least in part on what Johnson-Laird (1983) has called a 'mental model' of the task. Subjects whose mental model was that they were trying to find a single rule with exceptions found little or no negative evidence. For example, participants who proposed the triple '0,0,0' and were told it was incorrect guessed rules like 'any number except zeroes'. In contrast, the DAX-MED instructions suggested a mental model involving a search for two complementary rules. Subjects who proposed the triple '0,0,0' in this situation realized this MED result was a clue to another rule, and pursued it by proposing other combinations in which two or more numbers were the same (for a recent series of experiments that supports this analysis, see (Wharton, Cheng, & Wickens, 1993)). These results suggested that the critical relationship in Klayman and Ha's analysis was between the subject's hypothesis and her representation of the target rule.
Farris and Revlin (1989; 1989a) argued that many subjects who appear to be trying to falsify are actually searching for positive instances of a counterfactual hypothesis. For example, a subject who thinks the rule is 'even numbers' may propose 'odd numbers' as a counterfactual hypothesis, then test that with a triple like '3,5,7' which is a negative test with respect to 'even numbers' but confirmatory with respect to the counterfactual hypothesis 'odd numbers'. A counterfactual heuristic may be a successful way of converting the standard version of the 2-4-6 task to a DAX-MED problem, because a counterfactual hypothesis is roughly equivalent to a hypothesis about the MED rule, and successful DAX-MED subjects pursue positive instances of the MED rule.
This kind of fine-grained analysis of hypothesis-testing highlights the strengths and weaknesses of in vitro studies. The in vitro work allows us to look at heuristics under highly controlled, artificial conditions, manipulating variables like the relationship between a participant's most likely representation of the task and the actual rule. This kind of manipulation and control is impossible in vivo. However, in vivo studies are needed to see if the in vitro results are ecologically valid, i.e., applicable to real-world situations.
There are almost no studies involving scientists trying to solve these abstract problems. Mahoney (1977) compared a small sample of scientists working on the traditional version of the 2-4-6 task to a sample of Protestant ministers and found that the former were less willing to abandon their hypotheses than the latter. Mahoney initially saw this as evidence of a confirmation bias on the part of scientists, but one could also argue that they were following a positive-test heuristic. If one is to make any conclusions about the abilities of working scientists to solve abstract problems, more research with different rules and procedures are needed. For example, scientists should be run in a condition where they know they cannot ask the experimenter at any time whether their hypotheses are right.
One can gradually add realistic features to in vitro studies. One of the features that makes the 2,4,6 and related tasks so unrealistic is that every trial or mini-experiment produces results that are 100% reliable. In contrast, scientists are acutely aware of the possibility of error when they design and evaluate experiments. For example, Einstein's theory of special relativity was apparently falsified by the eminent physicist Kaufmann; Einstein himself remained undisturbed, however, and called for replication. Kaufmann's result was later found to be an error *(see Gorman, 1992).
In May of 1795, Joseph J. F. Lalande recorded a new star in two different positions over a three day period, and decided at least one, if not both of the observations were due to errors (Hoyt, 1980). This star was identified as the planet Neptune in 1846, and Lalande's original observations were used in computing its orbits. One scientist's error is another scientist's discovery.
To understand how error can be added to one of the problems that simulate scientific reasoning, let us once again use the 2-4-6 task as an example. In the usual version, every result is 100% reliable and unambiguous. I added the possibility of error by telling participants that anywhere from 0 to 20% of their results might be erroneous, i.e., a triple that was classified as incorrect might be correct and vice-versa. Error would occur at random, as determined by a random number generator on a calculator.
I thought that this possibility of error might make it easier for individuals to engage in confirmation bias. A recent example is the cult Heaven's Gate, which was certain that an alien spaceship accompanied comet Hale-Bopp.
In January of 1997 several cultists, including their leader Applewhite, bought a computerized telescope with a 10-inch mirror. They used it to look at Comet Hale-Bopp, and search for the "companion object." They were following a scientific impulse--seeking direct observation of the vehicle that would rescue them from our doomed planet.
They saw the comet perfectly. They saw no spaceship.
And then they returned the telescope to the store and asked for their money back (Achenbach, 1997,F4).
This is a classic use of what Doherty & Tweney (Doherty, 1988) called System-Failure (SF) Error to immunize a hypothesis against falsification. If you don't like the evidence, blame the instrument. There was a spaceship--there had to be. The telescope wasn't working.
The saddest part of the story is that the group killed themselves in the belief that they had to leave their 'vehicles' (bodies) before they could be taken away by the spaceship accompanying the comet. By all accounts, they died with smiles on their faces, certain of the resurrection.
One scientific analogue of this kind of error is an experimental result which appears to confirm a hypothesis but actually disconfirms it and vice-versa. The controversy between Millikan and Ehrenhaft over the charge on the electron serves to illustrate. R.A. Millikan presupposed a unitary charge; in his famous oil-drop experiment, he discarded results that appeared to suggest a fractional charge. But these results, if true, would have supported the theory of his competitor, Felix Ehrenhaft. "If Ehrenhaft had had access to Millikan's notebook, he would have found precisely those runs most valuable for his purposes, which, for Millikan, were failed" (Holton, 1986, p. 12). Having a mental model of the kind of rule one is looking for helps one identify and discard errors.
But suppose one has a Heaven's Gate mental model, totally out of whack with reality? One way to check whether an apparent disconfirmation is an error is to tighten procedures. Another is replication. In the Heaven's Gate case, the group might have tried a larger telescope, and observed over a long period of time. In Millikan's case, he and his technician refined their technique until they could produce the desired effect more reliably; his notebooks record 'beautiful' results more frequently later in his series of experiments, though there are still errors *(Holton, 1978, p. 71).
In Millikan's case, replication led to confirmation. It can also lead to falsification. Walter Alvarez recounted the day when he and his father thought they had discovered evidence that a supernova caused the extinction of the dinosaurs. The key empirical support for this hypothesis came from the presence of plutonium-244 in the KT boundary which marks the end of the Cretaceous and of the dinosaurs--a period of mass extinction. After an exhausting night of taking samples, two geochemists concluded that there was plutonium-244 in a sample of soil from the KT boundary--an apparent confirmation of the hypothesis. Luis Alvarez was ready to announce the discovery, but Walter tried the result and procedures on the Deputy Director of the Lawrence Berkeley Laboratory, who advised them to, "Do it all over again. Repeat every single step from the very beginning, on a fresh sample, to be absolutely sure there really is plutonium-244 in that clay" ((Alvarez, 1997, p. 74). They ran the whole set of procedures on a second sample and found no trace of plutonium-244. The heuristic in this domain, where the procedures are so difficult, is to trust the negative result. Replication had turned into falsification.
I wanted to simulate the effect of error on scientific reasoning in vitro, in order to find out how specific variables affected it. In my first series of experiments, I focused on the possibility of error by setting the error rate at 0. In other words, participants were told that as many of 20% of their results might be errors, but encountered no actual errors. Participants had to figure this out. Most used a heuristic I called 'replication plus extension', proposing triples that were similar to, but not exactly the same as, previous triples in an effort to replicate the current pattern and extend it slightly, e.g., following '2,4,6' by '4,6,8'. This looked much like the positive test heuristic recommended by Klayman and Ha; the difference is the goal--in addition to trying to confirm a hypothesis with positive tests, participants were trying to check for errors. Participants given possible-error instructions had to propose twice as many triple, but managed to discover Wason's rule as often as participants in a control condition.
But in an earlier study using the card game Eleusis, I had discovered that the possibility of error greatly interfered with subjects' abilities to solve a simple rule. One difference between the two tasks is that the cost of replication in Eleusis was much higher. One had to replicate not only a single card, but a sequence of cards. I experimented with giving subjects on the 2-4-6 task a similar rule, presenting it in a format that gave them results of previous trials. To get a feel for the task, try to do what participants did, and write down any guesses that you might have about a rule that could govern all five triples. Would your rule be any different if I told you it was possible one of these five results was an error, i.e., if it is a Y, it should be an N and vice-versa?
| Triple | Conforms to Rule |
| 1,2,3 | Y |
| 4,5,6 | Y |
| 4,5,6 | N |
| 5,10,15 | Y |
| 10,20,30 | N |
The key problem is what to do with the fact that 4,5,6 is right once, then wrong. Depending on one's hypothesis, one can label either of them an error. Then I gave participants five more triples. Consider whether these change your first hypothesis.
| Triple | Conforms to Rule |
| 10,33,12 | Y |
| 13,20,5 | Y |
| 14,9,14 | Y |
| 12,35,14 | N |
| 15,15,6 | N |
Let's consider an example. Suppose you hypothesized that the rule was odd and evens alternate within each triple. If you covered the '4,5,6 N' in the first set of triples and the '12, 35, 14 N' in the second, the triples would fit this hypothesis. This is akin to looking carefully at results of previous experiments in a scientific domain, and using the current hypothesis or paradigm to decide which were likely to be errors.
I allowed participants to propose as many as five additional triples of their own, which meant they had the opportunity to replicate. In most actual scientific situations, one does not have unlimited resources to devote to replication; therefore, I thought this five-triple limit was more realistic than unlimited triples.
In fact, there was no actual error. The rule was that numbers had to alternate odd and even across as well as within each triple. This made this task more like the earlier one I had used with cards: to replicate, participants had to repeat not just one triple, but a sequence of triples, and at the same time test their hypotheses. In a possible error condition, participants solved the rule only 15% of the time. In contrast, 50% of participants who were not told about any possibility of error solved the rule.
I tried a couple of in vitro simulations using actual error and the 2,4,6 task. Changing the amount error from zero to 20% greatly interfered with participants ability to discover Wason's original (Gorman, 1989(c)). Some of these 20%-error participants made repeated attempts to replicate and located many of the errors, but because of this, they were not able to adequately test the generality of their hypotheses and ended-up with rules like 'numbers must go up by twos'. Others simply used errors to immunize their hypotheses from disconfirmation. As one participant said, "I assigned errors to the triples I did because they did not fit my hypothesis" (Gorman, 1989(c), p.409).
These finding illustrates that, even on very simple artificial tasks, replication alone is not sufficient to isolate and eliminate errors. Collins (Collins, 1985) has discussed how difficult it is to replicate a result. Obviously, scientists rely on other kinds of checks in addition to replication, e.g., refinement of procedures. But these simple experiments demonstrate the way in which hypotheses are often used to identify errors, and the importance of replication. In contrast, my experience suggested that psychology journals were often unwilling to publish replications *(Gorman, 1992).
Hopefully, the description above of my own research using abstract tasks and novice participants will give the reader a sense of the pros and cons of in vitro experiments. The strength of these sorts of experiments is that one can set up a task in a particular way to assess how it will affect performance. For example, one can isolate the effect of the mere possibility of error and study them under carefully controlled conditions. The weakness is that one cannot be certain how these results will generalize to more complex situations involving multiple types of error. However, in vitro results can give us issues to focus on in vivo.
Even on these highly abstract tasks, discovery depends both on problem representation and on the strategy one uses to tackle it. I referred to the representations as mental models because of the breadth of this term--it can be used to describe how we solve syllogisms (Johnson-Laird, 1983), how we imagine the workings of a calculator, computer or VCR (Norman, 1993) and, as we saw in the first chapter, what form we think a rule or law might take. Consider Kepler--his initial mental model of a rule for orbits involved perfect circles; he abandoned this rule only when he was forced by negative evidence. Unlike my 2-4-6 subjects, he did not have to generate this negative evidence himself; instead, it was given to him by Brahe.
I refer to the strategies as heuristics because a heuristic is a kind of 'rule of thumb' that works sometimes and doesn't others. If your goal is to test a hypothesis, you can and should employ a number of strategies, depending on how you represent the problem: you might try a positive or a negative heuristic, or a counterfactual heuristic, or some combination.
Mental models are also used to discriminate erroneous data from valid results. There need to be other checks as well, like replication-plus-extension. But scientists need mental models to target probable sources of error. Millikan's mental model suggested that all results which did not indicate a unitary charge for the electron should be carefully scrutinized and replicated.
The distinction between abstract tasks and tasks that simulate scientific problems is somewhat fuzzy. Basically, the former refer to tasks that have no content which resembles the sorts of problems encountered in science, whereas the latter contain some content. My modifications to the 2,4,6 task to accommodate error fall into a fuzzy area; the task itself is highly abstract, but by the time one adds a review of literature, limits on replication and the possibility of error, one has a task that bears a closer resemblance to at least some scientific problems. In the next section, I will describe several tasks that have more of the 'look and feel' of actual scientific problems.
A group at Bowling Green State University (Mynatt, Doherty, & Tweney, 1977; Mynatt, Doherty, & Tweney, 1978) developed an artificial universe that required participants to discover the rules governing the motion of particles in a universe of shapes. In the most difficult version of this task, participants spent about ten hours firing particles at different arrangements of shapes. None of them discovered the rule. The participant concentrated on developing a hypothesis and trying to confirm it. In contrast, participants that focused on disconfirmation rejected promising ideas too quickly. Mynatt, Doherty and Tweney concluded that confirmation was an effective heuristic early in the inference process; once a subject or scientist had discovered and verified a pattern, then she could switch to the search for disconfirmatory evidence. This heuristic combination of confirmation and disconfirmation also worked on abstract problems like the 2-4-6 task, especially when the possibility of error was added. But the heuristic value of 'confirm early, disconfirm late' became most apparent on a task that simulated the complexity of actual science.
Kevin Dunbar (1989) created a computerized molecular genetics laboratory in which subjects were posed a problem similar to the one for which Monod and Jacob won the Nobel Prize in 1961. Dunbar did not intend to have subjects simulate the actual discovery path followed by Monod & Jacob; instead, he wanted "to use a task that involves some real scientific concepts and experimentation to address the cognitive components of the scientific discovery process." (Dunbar, 1989, p. 427).
Participants were given elementary training in concepts of molecular genetics, using an interactive environment on a Macintosh computer. Then they were allowed to perform experiments with three controller and three enzyme-producing genes; they could vary the amount of nutrient, remove genes, and measure the enzyme output. The mechanism the subjects had to discover was inhibition, whereas the mechanism they had learned in training was activation.
Dunbar used this task to make the argument that, "rather than inventing an arbitrary task that embodies certain aspects of science it is possible to give subjects a real scientific task to work with" *(Dunbar, 1987, p. 427). Hence, we use this problem as an example of a task that simulates an actual scientific problem.
But even so, the similarities between Dunbar's molecular genetics problem and the 2-4-6 task outweigh their differences. Participants on both are given instructions which explain their little universe; these instructions, like the starting triple 2-4-6, bias them towards a hypothesis that is different from the one they are trying to find, and they are able to do a wide variety of mini-experiments to discover the rule--which, although it represents an actual scientific relationship, is as arbitrary to them as the numerical formulas discovered by participants in the 2-4-6 task. There are none of the potential sources of error that occur in actual genetics experiments and no new techniques to be mastered.
Dunbar relates his findings to the literature on disconfirmation. In this task, all subjects eventually disconfirmed their initial hypotheses about the role of the activator gene--no matter what genes were present or absent, there was always an output. What is interesting is what they did next: 6 groups re-interpreted activation to mean a search for the gene that facilitated enzyme production, 7 searched diligently for an activator gene and eventually gave up, and 7 set the goal of explaining their surprising results. Five out of the 7 groups in this category actually found the inhibitor gene. Dunbar's results support the thesis that successful disconfirmation depends on how subjects or scientists represent the task.
Mynatt et al.'s artificial universe and Dunbar's molecular genetics simulation are not the only tasks that simulate scientific reasoning, but they are two of the best and most-cited, and give the flavor of the results one obtains. One other oft-used and cited task is the Big Trak problem, developed by Jeff Shrager (Shrager & Klahr, 1986). Because it involves learning how to run a device, it is not deliberately modeled after a scientific problem, but it is a discovery task.
In the typical version of this task, participants are asked to figure out the function of the RPT key on the back of a programmable vehicle. Let us consider a shortened account of the behavior of one participant, by way of example:
ML began with the hypothesis that RPT N would repeat the entire program N times. So he programmed it to go forward two spaces, then repeat that twice. The result was Big Trak went forward 4 spaces, instead of the predicted 6.
ML had now disconfirmed his initial hypothesis, so he revised it--RPT N repeated only the last step N times. So he programmed Big Trak to go forward 2, left 30, then RPT 1. Big Trak went forward 2 and left 60, confirming ML's hypothesis. Then he ran the same forward 2, left 30 sequence with RPT 2; instead of going left 60 as he expected, Big Trak repeated the forward 2, left 30 sequence twice. Note that ML has conducted a positive test and has gotten a disconfirmatory result. He replicated the whole sequence to make certain. Then he revised his hypothesis: RPT N meant repeat the N steps before the RPT instruction. He then tested it with varying lengths of N, making sure he understood how RPT selected the steps.
Like ML, most participants began with the idea that an instruction like RPT 4 meant 'repeat whatever program had been typed in four times' or 'repeat the last step in the program four times'. Typically, they began with positive tests and quickly obtained disconfirmatory information, though most were not as efficient as ML. In order to discover the rule, subjects had to change their representation of the role of the repeat key: it selected the step to be repeated, so that 'RPT 4' meant 'repeat step 4'. Subjects had to realize that the RPT key might serve as a selector, indicating which lines were to be repeated, instead of a counter, indicating the number of times something was to be repeated. The shift from a counter to a selector mental model directed subjects to a different part of the problem space to search for confirmations and disconfirmations. Similarly, the DAX-MED manipulation transformed participants' mental models of the 2-4-6 task from a search for one rule with exceptions to a search for two mutually-exclusive rules.
Klahr and Dunbar (Klahr, 1988) discussed the way in which participants switched between searching two problem spaces, one of which was a space of possible hypotheses and the other of which was a space of possible experiments. ML first considered a set of hypotheses that depended on the idea that RPT was a counter; he generated a space of possible experiments based on that mental model. When results violated expectations, at one point he switched to searching for a new kind of hypothesis, in which RPT selected the steps to be repeated. Disconfirmation can lead to a change in the type of hypothesis one is pursuing, which in turn directs one to search different parts of the experiment space.
Klahr and Dunbar concluded that their participants showed two different cognitive styles: Theorists and Experimenters. The former, when presented with disconfirmatory results, searched the hypothesis space for alternatives that would fit the evidence and also make interesting new predictions. ML did this when he thought about why Big Trak repeated the forward 2 left 30 sequence twice in response to RPT 2. The latter responded to disconfirmatory evidence by exploring the experiment space--at some point, most of them ran experiments which made the selector role of RPT salient. Theorists conducted about half as many experiments as Experimenters, and almost all of the former's experiments were guided by a hypothesis, whereas the latter's were often simply exploratory. IN a second study, Klahr and Dunbar found that participants with prior programming experience could discover the function of the RPT key by searching the hypothesis space, then conducting tests in the experiment space.
In a more recent study using a version of their RPT task, Klahr, Fay and Dunbar (1993) established that third and to a lesser extent sixth graders had trouble with evidence that disconfirmed counter hypotheses, in part because they could not switch to a selector hypothesis: "inconsistencies were interpreted not as disconfirmations, but rather as either errors or temporary failures to demonstrate the desired effect." (p. 140). Klahr, Fay and Dunbar interpreted this as a failure to coordinate searches in hypothesis and experiment spaces, a view we will explore in greater depth when we consider the performance of children on actual scientific problems.
Despite Dunbar's arguments about the importance of modeling tasks after real scientific problems, the conclusions from tasks that have the look and feel of scientific problems look little different from those derived from abstract tasks. What one learns is more about the relationship between mental models, hypotheses and experiments in a variety of domains that resemble aspects of science.
Type of Task:
Participants: |
Abstract |
Simulated Scientific Problem | Scientific Problem |
Novices |
Effectiveness of
heuristics like 1. positive test 2. counterfactual 3. replication-plus- extension depends on relationship of mental model to target rule. |
Demonstrate
importance of additional heuristics: confirm early, disconfirm late; coordinate search in two spaces |
|
Children |
Are unable to coordinate search in two spaces | ||
Experts |
Prefer a positive test heuristic |
Another way to study scientific thinking is to use actual scientific problems. On these problems, it is harder to manipulate features of the task like whether it requires background knowledge or can be done by anyone walking in 'cold', whether the rule is narrower or broader than the participant's most likely initial hypothesis, or indeed whether there is any rule at all, and how the problem space is structured. Such problems do allow us to stody differences in the way experts represent tasks in their domain, and what heuristics and algorithms they use.
Researchers like McCloskey (1983), Clement (1982) and Carey (Carey, 1992; Wiser, 1983) have established parallels between the mental models of modern novices and historical figures in the evolution of science. For example, McCloskey (McCloskey, 1983) found that college students held beliefs about physics that resembled those of Philoponus (6th century) and Buridan (14th century), who thought that a force was required to set a body in motion, and that the force gradually dissipated. Clement (Clement, 1983) found that freshman engineering students were a little more advanced: protocols of their attempts to solve motion problems resembled Galileo's reasoning in De Motu..
Brewer and Chinn (1991) studied how such beliefs change. They gave adult novices brief readings on quantum theory or special relativity and asking them a series of follow-up questions. Both quantum theory and relativity make predictions that conflict with common-sense beliefs about space and time and cause and effect. Some subjects simply rejected the new information, resembling those scientists who cling to the old paradigm. Other subjects showed at least partial assimilation of the new material: they were able to give an answer that corresponded to what they had read, but they "sure didn't believe it." (p. 70) Another move was to interpret the answer in terms of existing beliefs, for example, by treating relativistic phenomena as optical illusions.
Jean Piaget argued that the development of scientific thought in the child recapitulated the evolution of science (Bringuier, 1980). Studies that show how the scientific beliefs of children and novices change owe much to Piaget's inspiration. This line of work is also influenced by Thomas Kuhn's (Kuhn, 1962) view that long periods of normal science are followed by crises caused by anomalies in the reigning paradigm. A paradigm corresponds to something like a collective mental model--a good example is the circular orbit model that was almost universally accepted before Kepler. Brahe's anomalous data sparked a crisis, which Kepler resolved by proposing his new model of the solar system.
From a Kuhnian standpoint, the mental models held by practitioners before and after a paradigm shift are incommensurable--those holding the older view cannot even understand the new one. Kuhn's views are by no means accepted by all or even most historians and philosophers, but they are extremely influential. If Piaget and Kuhn are right, children and novices should go through revolutionary shifts in mental models as they learn scientific concepts. For example, Chi (1992) used a Kuhnian framework to review the literature on conceptual changes in children and adults. She argues that radical conceptual change often occurs before anomaly recognition, whereas most of the hypothesis-testing literature tends to take anomaly recognition for granted--except under error conditions, it is clear when a triple is at variance with a hypothesis. Her own analysis suggests that recognition and resolution of anomalies requires a shift to a new system of categories similar to the kind of paradigm shift made famous by Kuhn.
Similarly, Carey (1992) compared the problems children ages 3 to 5 have differentiating weight and density with the problem scientists before Black had differentiating heat and temperature: in both cases, the view before differentiation seems to belong to a different, incommensurable paradigm from the view afterward. Carey is therefore sympathetic to Kuhn's views, but less to those of Piaget, who proposed major changes in the cognitive abilities of children as they passed from one stage of development to another. Carey finds changes in conceptual content in specific domains as the child grows older, not general changes in cognitive ability.
Brewer and Samarapungavan (1991) concluded "that the child can be thought of as a novice scientist, who adopts a rational approach to dealing with the physical world, but lacks the knowledge of the physical world and experimental methodology accumulated by the institution of science" (p. 210). Like Carey, they argue that the apparent differences in thinking between children and adults is due to differences in knowledge, not the ability to employ reasoning strategies. For example, they studied second-graders and showed that those who had a flat-earth mental model could incorporate disconfirmatory information consistent with a Copernican view by transforming their model into a hollow sphere. They used this new mental model to solve a range of problems about the day/night cycle and motion of individuals and objects across the earth' surface *(see Vosniadou and Brewer, In Press).
In contrast, Deanna Kuhn (Kuhn, 1989) argued against the 'child as novice scientist' view. "Both child and scientists gain understanding of the world through construction and revision of mental models. Recent research....suggests that the process in terms of which mental models, or theories, are coordinated with new evidence is significantly different in the child, the lay adult, and the scientist....In some very basic respects, children (and many adults) do not behave like scientists" (Kuhn, 1989, p. 687).
D. Kuhn, following Klahr and Dunbar (Klahr, 1988), felt that it was important to distinguish between two problem spaces that have to be coordinated when one is solving scientific problems. One is a space of possible hypotheses, the other is a space of possible experimental or observational results that might bear on the hypotheses. According to D. Kuhn, in the child, experiment and hypothesis spaces are merged into a single mental model, without any clear distinction between the two. In the scientist, theory and evidence are clearly separated. The novice adult falls somewhere between.
In her research, D. Kuhn focused on theory revision in the light of evidence. One of her studies involved hypotheses about the relationship of food and colds. She cites one child who believed that relish caused colds and candy bars did not. This child was presented with instances whose overall pattern showed neither variable made any difference, but instead she picked out individual results that supported her theory, singling out positive tests for relish and negative for candy bars and ignoring the rest.
This process can occur in adults, too, and have enormous significance. The Dow Corning company has been forced into Chapter 11 because of litigation regarding its silicone breast implants. Nikr Kossovsky is often called as an expert witness in these trials. He ran a standard, but very difficult, test for antibodies and found that "scores of 9 of his 249 women with implants were significantly higher than the mean score of the 47 healthy women or of the 39 women with autoimmune disorders. But those 9 women represented less than 4 percent of all the women with implants he tested. What if in reality his...test was meaningless? Then he might expect 4 percent of all women to score equally high. Because his two comparison groups had comparatively few women, 4 percent of those would be fewer than two from each group. With numbers this small, it is not particularly surprising that he got zero" instances of higher scores from his comparison groups (Taubes, December, 1995, p. 71).
Like children in D. Kuhn's study, Kossovsky singled-out a few positive results without taking into account the overall pattern. These kinds of biases can have multi-million dollar consequences.
Overall, D. Kuhn found that adults were better than children at conducting coordinated searches of hypothesis and evidence spaces on tasks where this sort of financial incentive was absent. Scientists were even better. The key, according to D. Kuhn, is the development of metacognitive skills that permit delineation of theory and evidence, and a coordinated search in two spaces. Metacognition involves being aware of one's own cognitive processes, and modifying them when necessary. In this case, metacognition involves being aware that a mental model is just that--a working model that may have to be modified in the light of evidence.
Similarly, Klahr, Fay and Dunbar (1993) found that adults performed better than children on tasks that simulate scientific problems because the adults possessed "a set of domain-general skills that go beyond the logic of confirmation and disconfirmation and deal with the coordination of search in two spaces" (p. 141). A coordinated dual space search facilitates shifts in representation that lead to new mental models.
Carey and Brewer feel that development of scientific knowledge has more to do with changes in domain-specific knowledge, whereas D. Kuhn and Klahr place more emphasis on heuristics and metacognitive abilbities.. This debate has important implications for teaching discovery. Does one promote discovery simply by teaching the content of a domain, or does one encourage the development of metacognition and heuristics like dual-space searches? The obvious answer is to do both.
Interestingly, D. Kuhn's research has focused more on situations where the relationships between variables are less than perfect--where one needs to look at the overall pattern of positive and negative results. Vosniadou and Brewer, in contrast, preferred to help children clarify their mental models by pointing out inconsistencies and places that needed elaboration. For example, they shoed children who said the Earth was round a picure of a house and asked them questions like , " This house is on the earth, isn't it? How come here the earth is flat, but before you made it round?" (Vosniadou & Brewer, 1992, ). Children were able to modify their mental models to accommodate this sort of contradiction. One solution some adopted was to visualize the earth as a kind of flattened sphere, a kind of thick pancake.
In other words, Brewer's children don't have to conduct a search in two spaces--they are given results from the evidence space. They are able to use this evidence to modify their hypotheses. Similarly, in Brewer's adult study mentioned above, results from the evidence space were summarized for participants in a way that highlighted the contradiction between their mental model and the result.
The point here is a reflexive one: how you set up the experimental task determines, in part, your results. It is much the same with computational simulations.
D. Kuhn and Brewer could still be surprised by what they found, just as computational simulations can surprise their creators. But there is a difference between studying how children deal with less that perfect covariation between variables (D. Kuhn) and how they deal with what Thagard calls explanatory coherence (Brewer). Both kinds of study are valuable, and well-conducted. In the former, children appear to lack abilities characteristic of adult scientists, and in the latter, they appear to possess them. The obvious compromise is to try to determine the kinds of tasks and situations on which the performance of children and novices will resemble those of experts, and the tasks on which they will not. One could, for example, take the same set of children and run them through both co-variation and explanatory tasks and compare their performance to scientists confronted with the same type of problems. Intriguingly, Faust (Faust, 1984) suggests that even experts often do poorly with co-variation problems.
In the previous section, we compared children and adults. In this one, we will talk about how children and adult novices compare with experts. "For a long time the study of exceptional and expert performance has been considered outside the scope of general psychology because such performance has been attributed to innate characteristics possessed by outstanding individuals. A better explanation is that expert performance reflects extreme adaptations, accomplished through life-long effort, to demands in restricted, well-defined domains (Ericsson & Charness, 1994, p.744). Expert knowledge needs to be more than a 'pile of facts'--it needs to be structured in ways that facilitate problem-solving (Ericsson & Charness, 1994).
Larkin argued that this knowledge is organized in sets of condition-action pairs known as productions, similar to the production rules used by the various forms of BACON, which were activated by patterns in the data (Larkin, McDermott, Simon, & Simon, 1980). She and her colleagues found that when an expert physicist encountered a familiar problem, the initial information typically triggered a set of productions which rapidly produced the correct equations--the expert had automated much of the problem-solving process, and worked forward from the information given. Novice physics students had to struggle backwards from the unknown solution, trying to find the right equations and quantities; they therefore took much longer even when they were able to find the correct result.
Consider the following example. Suppose we have to find the value of the friction coefficient for a block resting on an inclined plane. The initial problem statement gives the weight of the block, the angle of the plane and the force pushing against the block. The expert will work forward from the givens, generating the necessary equations to solve for the friction coefficient. The novice will typically start from the goal, generating the final equation, and trying to find values for the variables in that final equation by generating other equations that use the data given at the beginning of the problem to solve for each. When all variables have values, the novice stops (Anzai, 1991).
Working forward and working backward are examples of general, or 'weak' heuristics that can be applied across a wide range of problem-solving situations. Note that either heuristic can work, but working forward is typically faster and more efficient--the steps in the problem can be laid out systematically. Novices tended to try to apply equations early, whereas experts reason qualitatively until they arrive at a representation that suggests what set of equations to use (Larkin, 1983).
This finding suggests that expert/novice differences in heuristics are related to differences in mental models. Chi et al. (Chi, Feltovich, and Glaser, 1981) asked experts and novices to group physics problems based on their similarity, where the definition of similarity was determined by the participant. They found that "that experts tended to categorize problems into types that are defined by the major physics principles that will be used in solution, whereas novices tend to categorize them into types as defined by the entities contained in the problem statement" (p. 150). In other words, for experts, categorization is a first step towards solution.
Experts tend to classify problems as having to do with principles like 'conservation of momentum', whereas novices tend to do a more common-sense reading of the words and diagrams in a problem. Expert physicsits also generate diagrams that are "principle-oriented abstractions of physical objects" (Anzai, 1991, p. 88). whereas novices tend to rely more on diagrams that look like concrete objects
In a discussion of the way Galileo transformed the motion of a pendulum into an abstract, representation, Michael R. Matthews gives us a good description of these expert representations:
Planets and falling apples have color, texture, irregular surfaces, heat, solidity and any number of other properties and relations. But when they become the subject matter of mechanics they are merely point masses with specified accelerations; when thus conceptualized and delimited, they are no longer natural objects, but theoretical objects. In a similar way, when apples are considered by economists they become theoretical objects of a different sort--commodities with specific exchange values. When botanists consider apples they create yet other theoretical objects. For Galileo a sphere of lead on the end of a length of rope swinging in air, when it is considered by his mechanical theory, becomes a pendulum conceived as a point mass at the end of a weightless chord suspended from a frictionless fulcrum moving in a void (Matthews, 1994, p. 125).
Galileo solved the pendulum problem by abstracting it in the way suggested by the last line of the quotation, much to the frustration of his former mentor and leading critic, del Monte, who protested that actual pendulums did not behave in the way predicted by Galileo. Galileo countered by pointing out the way in which the actual pendulums failed to attain the ideal, frictionless state he was describing. Like modern novices struggling to attain the predicted result in a science lab, del Monte found that it is hard to make reality conform to the abstract representation.
Bucciarelli (Bucciarelli, 1994) includes a detailed analysis of the transformations a student has to be able to make in order to solve a textbook design problem. The student sees a picture of a hydraulic cylinder moving up and down through a slot and is asked to determine the numerical value of several variables at a particular instant in its motion. Like Galileo, the student has to turn a concrete picture into an abstract one, although in the student's case, even the concrete picture is covered with mathematical terms and values (see Figure 7).

Figure 7: Diagram accompanying a problem concerning a hydraulic cylinder (Bucciarelli, 1994, Fig 6, p. 99)
The student must transform this picture into an even more abstract representation:

Figure 8: A more abstract representation of the problem in Figure 7 (Bucciarelli, 1994, Fig 8, p. 106).
The transformation reveals the underlying form of the exercise. It is a 'vectore differential calculus' problem--abstract, universal and unencumbered. There is nothing left of the mechanism save its essence...no longer any pretense of machinery, hydraulic cylinders, piston rods, slotted arms, or frictionless pins. All of that is irrelevant. The student must learn to perceive the world of mechanisms and machinery as embodying mathematical and physical principle alone, must in effect learn to not see what is there but irrelevant. (Bucciarelli, 1994, p. 107).
Bucciarelli shows the kind of transformations novices must learn to make before they can solve familiar textbook problems. Subjects in these expert-novice comparisons typically work on such textbook-style word problems, not hands-on laboratory tasks. Therefore, findings from the expert-novice literature are especially relevant to educational situations (Reif and Larkin, 1991) but may have less relevance to scientific practice. Green (Green and Gilhooly, 1992) argued that "the standard expert-novice contrastive paradigm by requiring use of problems accessible to novices has led to a relative neglect of how experts tackle difficult problems and how experts detect and recover from errors in the face of task difficulty" (p. 67).
Similarly, Anzai pointed out that, "most of the recent cognitive research on physics has been limited to 'routine' problem-solving by experts and novices. That is, the primary concern has been with the simple problems often seen in high school or college textbooks. Although such routine problems are the real problems with which engineers deal, experts at the frontiers of physics are trying to discover the unknown principles of the physical world and to construct new types of representations that will help explain it in scientific terms. Only those who succeed in generating novel representations will be long remembered in the history of physics..." (Anzai, 1991). In other words, textbook problems correspond to what Thomas Kuhn called normal science; indeed, he argued that students learned the dominant mental models in their fields from textbooks. Revolutionary science lies beyond the textbooks, and it can take some time before new discoveries are integrated into textbook knowledge for a new generation of students.
Klahr, Fay and Dunbar (1993) point out that the expert-novice studies cited above do not give children or adults the opportunity to design new experiments and formulate and evaluate hypotheses, whereas experiments with simulations like the 2-4-6 task do. In a study using a task that permitted children and adults to generate experiments and hypotheses, Klahr, Fay and Dunbar (1993) found that superior adult performance "appears to come from a set of domain-general skills that go beyond the logic of confirmation and disconfirmation and deal with the coordination of search in two spaces" (p. 141). Klahr and D. Kuhn therefore agree on the importance of metacognition. It is not enough to know a lot of information, not even enough to be able to form abstract representations--to discover, one must be able to mount a coordinated search for new hypotheses and evidence that bears on them.
Type of task:
| Participants: | Abstract | Simulated Scientific Problem | Scientific Problem |
| Novices | Effectiveness of
heuristics like .positive test .counterfactual .replication-plus- extension depends on relationship of mental model to target rule. |
Demonstrate
importance of additional heuristics: confirm early, disconfirm late; coordinate search in two spaces |
Use common-sense
representations and weak heuristics. |
| Children | Unable to coordinate search in two spaces | Can modify mental models to achieve explanatory coherence | |