2.3 Cognitive Psychology of Science

In this section, we will look at what psychologists and cognitive scientists have to contribute to the study of scientific thinking. From the standpoint of traditional philosopher of science, the underlying logic of science is what is really important, not the mental processes of individual sciences. There may be interesting psychological stories to tell about how discoveries are made, but these are not relevant to how they are justified by the scientific community (Siegel, 1980). There are notable exceptions to this view. For example, philosophers like Steve Fuller and Ron Giere put psychology at the center of science studies in very different ways, the former linked more to Skinner's behaviorism and the latter to cognitive psychology (Fuller, 1989; Giere, 1988).

From the standpoint of at least some sociologists, "thinking is not something that happens inside heads or brains" (Restivo, October, 1995). The way to study scientists is to look at their interactions with each other and the inscriptions that they produce *(Latour, 1987).

If science is either largely the product of an underlying logic or of social negotiations, then psychology is marginal--a way, perhaps, of accounting for aspects of discovery and knowledge transmission having to do with perceptual and physiological processes. Psychologists contributed to this marginalization, I think, by their reluctance to study the methods they were using to justify their existence. Most psychologists are very concerned with making their field more scientific. Kuhn thought textbooks revealed the image of science: "the aim of such books is persuasive and pedagogic; a concept of science drawn from them is no more likely to fit the enterprise that produced them than an image of a national culture drawn from a tourist brochure or a language text" *(Kuhn, 1962, p. 1).

Almost every introductory psychology textbook I have seen opens with a statement about how psychology is a science. For example, the textbook I used when I taught introductory psychology states that "once psychologists have developed a theory, they proceed in much the same general ways, regardless of the exact content of the theory. They subject the theory to empirical tests. They predict in advance what sorts of observable actions should occur when certain variables are changed. By testing to see whether their predictions describe the actual research outcomes, they find out whether their laboriously constructed theories are correct" (Darley, Glucksberg, & Kinchla, 1988, p. 8). To a philosopher or sociologist, the statements about science would sound naive. Perhaps psychologists do not want to question the methods they are trying so hard to adopt. If so, they are following the pattern of textbooks in the 'harder' sciences. .

2.3.1 Can Science Be Used to Study Science?

Cognitive psychologists want to use the methods of science to study science. Therefore, cognitive scientists a faced with a paradox--can science be used to study science? This harks back to the point we made earlier about not relying solely on using Azande methods to study Azande beliefs. If by science we mean a method guaranteed to find orderly relationships in the natural world, and if the object of studying science was to verify that this Method did achieve its goal, then we would be on shaky ground.

But perhaps we are really talking about methods, where the small 'm' denotes the fact that different scientific disciplines have different practices. Making generalizations about discovery by looking closely at what scientists actually do potentially gets us out of the problem of using a Method to verify itself (Barker, 1989). The methods of psychology are not the same as those of physics, or sociology--and within psychology, there are differences in methods among cognitive, social and personality psychologists. For example, all may use experiments from time-to-time, but different specialties design different sorts of experiments. (We will encounter examples of these different types of experiments later in this chapter.)

To make this question more specific, can one use the experimental method to study the experimental method? Yes, if the kinds of experiments reflect the unique practices of a specialty different from the one being studied, and also if the experiments are triangulated with other methods like fine-grained case-studies--particularly if these methods are borrowed from other disciplines like anthropology and sociology. (Again, we will encounter examples of this triangulation later in this chapter.)

2.3.2 Does Cognitive Psychology Presume Rationalism and Realism?

Most of the participants in the realist/relativist debate seem to assume cognitive studies will reproduce a kind of Feynman view of the scientist as rational seeker after objective truth. But in fact, one can study the cognitive processes of the Azande as easily as those of scientists.

For example, Edwin Hutchins has done a superb study of the cognitive processes involved in traditional Micronesian navigation (Hutchins, 1983). A small group of expert navigators from the Central Caroline Islands routinely embark on ocean voyages of several days out of sight of land; they belong to a pre-literate culture and use none of the Western technology of navigation, not even a compass. Their navigation begins from the assumption that the boat is stationary and islands gradually move by it; the passage of reference islands is marked by the position of the stars. In most cases, these reference islands are imaginary constructs. The Micronesian navigator has a very different mental model from his modern Western counterparts, but one that is no less amenable to cognitive analysis.

Cognition is usually seen as an individual activity, too, but as Hutchins has shown, modern Western navigation is a team activity that is still amenable to cognitive analysis (Hutchins, 1995). There is no clear demarcation between the cognitive and the social: one merges into the other. But in this book, we will begin from the cognitive end.

From a sociological standpoint, 'beginning from the cognitive end' is equivalent to saying we will apply the tools of one discipline or community (cognitive psychology) to studying and understanding the practices of others (sciences). The tools, of course, carry a framework with them, and therefore it is important for those applying the tools to be aware of the assumptions that 'come along for the ride'. Indeed, the tools, methods and assumptions ought to be applied to the studiers as well as those studied.

This is the 'reflexive turn' in the sociology of scientific knowledge--an effort to say the equivalent of "sociologist (or cognitive scientist) study thyself." A further implication is that there is no privileged viewpoint, no solid epistemological ground, from which one can study--the philosopher, the psychologist, the sociologist and the scientists themselves simply speak with different voices. As Latour has cried in mock-horror,

But where can we find the concepts, the words, the tools that will make our explanation independent of the science under study? I must admit that there is no established stock of such concepts, especially not in the so-called human sciences, particularly sociology. Invented at the same period and by the same people as scientism, sociology is powerless to understand the skills from which it has so long been separated. Of the sociology of the sciences I can therefore say, "Protect me from my friends, I shall deal with my enemies," for if we set out to explain the sciences, it may well be that the social sciences will suffer first (quoted in Lynch, 1992, p. 230).

One must be reflexively aware of the black boxes created by those studying science. In an earlier book, I tried to open the black-boxes created by cognitive scientists doing experimental studies of scientific reasoning (Gorman, 1992a). This chapter will review some of that work, but will go substantially beyond it, to consider recent developments that move cognitive science in the direction of focusing more on practice and worrying less about Method. Where possible, I will use adopt the 'meta-alternation' strategy advocated by Collins and Yearley , whose answer to the problem of reflexivity is to advocate alternating between perspectives (Collins & Yearley, 1992) Therefore, I will use aspects of the sociology of scientific knowledge to provide an occasional alternative to the cognitive account

What we will find is that there is not one cognitive psychology community which applies its tools to science--there are disparate groups, with different definitions of what constitutes the appropriate focus of study. There is an even wider group of practitioners who call themselves cognitive scientists. Cognitive psychology is one of the areas that is usually subsumed under this interdisciplinary label; other practitioners include computer scientists interested in artificial intelligence and machine learning, philosophers of mind sympathetic to computational approaches, neuroscientist interested in functions like memory and cognitive anthropologists like Edwin Hutchins (Gardner, 1985). Cognitive scientists are often reluctant to study science for the very reason we cited at the beginning of this chapter--many cognitive scientists want to be viewed as following the Scientific Method, and any attempt to apply their tools to studying science makes it sound like they are questioning the basis of their own beliefs.

2.3.3 Computational Simulations of Scientific Discovery

One of the few orthodoxies adhered to by most cognitive scientists is the belief that a hypothesis ought to be expressible in computational form (Baars, 1986; Johnson-Laird, 1988). But this again raises the problem of using a method to validate itself--if all cognitive hypotheses have to be in computational form, then any aspects of scientific practice that could not be described by current computational techniques would be ignored. Similarly, the Azande might insist that all hypotheses about the weather be put in their cultural framework. What would have happened if Faraday had been told all his hypotheses about electricity and magnetism had to be put into equations? Faraday worked in a rigorous, geometric fashion, but he did not reduce his lines of force to equations--that was left to Maxwell.

There is a further complication to the computational work, one that illustrates the way in which cognitive science and cognitive psychology can diverge. Most of the computational simulations have dual goals--to model human problem-solving (cognitive psychology) and also to explore how and whether machines can discover (cognitive science). The traditional goal of Artificial Intelligence (AI) has been to understand human intelligence by building machines that mimic human mental processes. In contrast, the goal of expert system design is to create systems that will be as good as, or better than, human experts.

At first blush, it might seem that the two goals are the same. But in many domains, computers assist experts by doing what human beings cannot--calculating at speeds far in excess of the human nervous system, storing far more information in a literal form than any human brain could, providing sophisticated real-time visualizations, etc. The word 'computer' originally referred to a human being especially trained to do high-speed calculations. So computers are already taking over areas of human expertise, but in most of these, they are not functioning like humans. Consider, for example, the best chess programs, which work by brute force--literally calculating most of the possible combinations, whereas the human opponent relies on heuristics to narrow the search space.

This book is about understanding human discovery, therefore, the AI literature is more relevant than the literature on machine-learning., though the boundary between the two is fuzzy. In particular, many expert systesm using brute-force heuristics can become important aids in discovery. Already, most chess players own a computer which they practie with and which they may even be allowed to bring to tournaments. For those who take the veiw that cognition is often shared across a network of tools and actors, the computer is part of the process we refer to as mind *(Gorman, 1997).

The examples in the last chapter illustrate the diversity of computational approaches to discovery: BACON and KEKADA emulated the sorts of heuristics used in scientific discovery, CLARITY allowed the user to explore different discovery paths and Thagard's connectionist system tried to show how scientific controversies could be resolved by explanatory coherence. There are quite a few other approaches as well (Cheng, 1992; Shrager, 1990).

This type of simulation fits the goal of applying the tools of cognitive science to the practice of science. The various computational techniques are tools used by individuals or groups that label themselves cognitive scientists, and in the last chapter, we saw examples of their application to actual cases of discovery.

We also noted that each technique carried an assumptive framework with it. KEKADA makes Kreb look like an exemplar of Herbert Simon's views regarding cognition, CLARITY supports Gooding's framework and Thagard's ECHO supports his philosophical notions.

We also noted that each technique carried an assumptive framework with it. KEKADA makes Kreb look like an exemplar of Herbert Simon's views regarding cognition, CLARITY supports Gooding's framework and Thagard's ECHO supports his philosophical notions. This does not mean these simulations are not capable of surprising their creators, just that the range of possible surprises is limited by the structure of the simulations.

We are back at the issue of using a method to validate itself. We cannot use KEKADA to validate a heuristic-based approach to an understanding of discovery because the heuristic approach is embedded in KEKADA. For example, KEKADA cannot decisively refute those who argue that heuristics are just post-hoc rationalizations that are used to explain what experts appear to do, but do not reflect the way they actually solve problems (Suchman, 1987). But we could use other methods to complement simulations like KEKADA, like detailed case-studies (Kulkarni, 1988) and experimental simulations (Qin, 1990). We could also create a counterfactual computational simulation of Krebs' process, perhaps based on a connectionist algorithm or on the kind of visual programming used in CLARITY, one that was not heuristic-based. If such a simulation compared well with an actual case-study, it would show that alternatives to heuristic-based models can give at least as good an account of scientific discovery. Unfortunately, multiple computational approaches have not, as far as I know, been applied to a single case--instead, each approach tackles its own problem.

2.3.4 If Machines can Discover, Do we Need a Sociology of Scientific Discovery?

A decisive and sufficient refutation of the 'strong programme' in the sociology of scientific knowledge (SSK) would be the demonstration of a case in which scientific discovery is totally isolated from all social or cultural factors whatever. I want to discuss examples where precisely this circumstance prevails concerning the discovery of fundamental laws of the first importance in science. The work I will describe involves computer programs being developed in the burgeoning interdisciplinary field of cognitive science, and specifically within 'artificial intelligence' (AI). The claim I wish to advance is that these programs constitute a 'pure' or socially uncontaminated instance of inductive inference, and are capable of autonomously deriving classical scientific laws from the raw observational data (Slezak, 1989).

Simply put, Slezak's argued that, if programs like BACON can discover, then there is no need to invoke all these interests and negotiations the sociologists use to explain discovery. His claims sparked a vigorous debate in the journal Social Studies of Science (see the November, 1989 issue).

In contrast, Shrager and Langley argue that, "two important aspects of intellectual activity--embedding and embodiment--that have significant on science...have not been addressed by exiting computational models. Briefly, science takes place in a world that is occupied by the scientist, by the physical system under study, and by other agents, and this world has indefinite richness of physical structure and constraint. Thus the scientist is an embodied agent embedded in a physical and social world" (Shrager, 1990, p. 224).

The embodiment issue is perhaps most clearly illustrated by the powerful KEKADA simulation, which still could not emulate the tissue-slicing procedures that were critical to Krebs' discovery. Robotic systems and neural nets that do pattern recognition may someday be able to simulate more of the embodied character of scientific procedures.

One important aspect of embodiment is visualization, and here computational simulations are making interesting progress. Cheng and Simon showed that it might have been easier for Huygens and Wren to have discovered the law of conservation of momentum using diagrams rather than deriving it from theory or by data-driven processes similar to those used by BACON (Cheng & Simon, 1992). Cheng then created HUYGENS, a more general computational simulation of discovery by one-dimensional diagrams. He noted that:

HUYGENS provides further computational evidence for the view that switching back and forth between representations is an effective way to enhance creativity. From given numerical data, HUYGENS switches to a space of diagrams in its search for regularities by looking for patterns in the diagrams. When patterns have been found, the regularities are simply transformed back into equations. The change to diagrammatic representation permits different operators, regularity spotters and heuristics to be employed that are more effective than those used in the direct search of a space of algebraic terms (Cheng & Simon, 1995, p. 224).

Cheng admits that we cannot be sure the real Huygens used this method--but it is plausible, historically, and HUYGENS demonstrates that it would have been more efficient than alternatives. Instead of claiming he developed a program that discovers, Cheng argued instead that he had provided computational evidence for the importance of using diagrams in scientific discovery, evidence that could be combined with material from other sources, e.g., fine-grained case-studies of the way diagrams are used in actual discoveries. Cheng's work is still a long way towards being embodied, but it is a step in the right direction.

Another computational approach that has potential for addressing the problem of embodiment is the use of neural networks designed to simulate aspects of the human nervous system. While these networks are particularly valuable for modeling the sorts of sensory processes involved in recognizing and manipulating objects, they may also be able to provide insights into the kinds of connections among neurons that would promote creativity (Martindale, 1995).

But neither Cheng's diagrams nor neural networks can yet simulate the interplay between instrument, hand and eye--and the way in which the scientist is also an inventor. Indeed, one could argue that the computer is itself part of the process of embodiment--it is one of the tools modified by the scientist that facilitates discovery.

Shrager and Langley also make an important point about the fact that computational simulations fail to capture the way scientists are embedded in social networks. Ironically, all of these computer simulations are themselves embedded in a rich network of human negotiations. It is the humans who seek funding for them, supply them with their data and make the claim that they discover. That is why Slezak is wrong about discovery programs refuting the sociology of scientific knowledge: the programs are themselves embedded in the processes they are supposed to refute! Brannigan argued that Azande computer scientists could "write a program which, given the selective identification of the data observed by Azande experts, would rediscover witchcraft as a cause of illness" (Brannigan, 1989, p. 610). Such a program would teach us a great deal about the heuristics used by the Azande equivalent of witches, but would not establish that witchcraft constituted a 'socially uncontaminated' method of inference.

Computer simulations do not replace a sociology of discovery. They complement fine-grained studies of the discovery process, allowing us to model individual cognitive processes but also potentially aspects of the social negotiations involved. This is reflected in the movement towards case-based simulations of reasoning that reflect the kinds of embedded learning that occur through apprenticeship, and could allow better computational models of human creativity (Schank & Cleary, 1995). We will have more to say about case-based reasoning when we discuss situated cognition at the end of the chapter.

If intelligent machines do emerge in the future, they will form part of the scientists' network, assisting in those areas where humans are weak--high-speed computation, statistical corrections to diagnostic reasoning (Faust, 1984), three-dimensional simulation of processes that are difficult to visualize, etc. Future sociologists and psychologists will need to study how these intelligent programs fit into the discovery process. A good example is Feigenbaum's Dendral expert systems, designed to infer candidate molecular structures from spectral data. Although Dendral was successful at its task, it was not adopted by working scientists; instead, its algorithms were transferred to databases that are currently used widely. As Dendral's creators commented, "As AI researchers we seriously underestimated the problems of technology transfer and the nature of the barriers to diffusion. 'Underestimate' is charitable: we really didn't have the foggiest idea." (Feigenbaum & Buchanan, 1993, p. 238) Further research is needed on how human and computer experts work together to make discoveries.

2.3.5 In Vitro and In Vivo Studies of Scientific Thinking

Kevin Dunbar (Dunbar, 1995) used a biological analogy to classify studies of scientific thinking. In vitro studies are experiments on scientific thinking, analogous to biology laboratory experiments. In vivo studies are case-studies of scientists and science students in their working environments, analogous to studies of biological organisms in their natural environments. Dunbar does not provide us a biological analogy for computational simulations, because such simulations can be used by biologists to model what goes on in both in vivo and in vitro studies: one can use laboratory or field data to test or create computational models of biological processes.

There are essentially three types of tasks used by cognitive psychologists in their in vitro studies of scientific thinking:

1) Abstract problems which model aspects of scientific reasoning.

2) Tasks that are designed to simulate actual scientific problems.

3) Actual scientific problems.

Examples of these problems will be found below.

Another distinction used by cognitive psychologists has to do with whether a study uses expert or novice participants, or both. An expert, in this case, would be an actual scientist. The novice category is mostly made up of college students of a variety of backgrounds, some of whom may have taken a few science courses but none of whom are practitioners. Most psychology experiments use undergraduates--often ones taking a psychology course. But note that there may be important differences in expertise within this ambiguous 'novice' category: occasional studies have referred to graduate students in a scientific field as novices when compared to expert scientists, so I lump all students in the novice category.

This special category of 'novice' is worth mentioning. Children have been used as participants in a number of studies of scientific reasoning, on the grounds that the kinds of conceptual changes they go through replicate the sorts of changes that occur in scientific revolutions. We will include a few of these studies in the novice category in the table.

This way of organizing the cognitive psychology of science is summarized in the following table:

Type of task:

Participants:

Abstract

Simulated Scientific Problem

Scientific Problem

Novices

     

Children

     

Experts

     

Note that an expert scientist could be used as a participant in an in vitro study, and a novice could be used in an in vivo study--say, a field study of how children or college students learn scientific concepts. We will employ this table iteratively, using it to summarize findings in each of the cells in which there has been significant research.

2.3.5.1 Abstract Tasks

Imagine you are a participant in a psychology experiment. You are told that the three numbers '2,4,6' are an instance of a rule the psychologist has in mind. You are to solve the rule by proposing additional number triples; the psychologist will tell you whether each corresponds to the rule or not.

If you are like most participants, you will begin with numbers like '6,8,10'. When the psychologist says, "That's correct," you will continue the pattern, perhaps proposing '10,12,14'. You might at that point stop and ask if the rule were 'even numbers ascending by twos'.

This particular task, created by Peter Wason (Wason, 1960), has often been used to study scientific reasoning (Gorman, 1992a). At first blush, this seems absurd--what can a three number problem have in common with the kinds of cases of discovery described in the last chapter? One could, however, view each of the number triples proposed by the participant as a kind of experiment, directed at finding an underlying law. 'Even numbers ascending by twos' is a hypothesis discovered by a participant, based both on previous evidence--the triple '2,4,6'--and on experimental triples proposed by the participant.

Note the resemblance between this task and the kind of data-driven discovery performed by BACON. Both participants in the 2,4,6 task and BACON are trying to find the rule that governs a set of numbers--except that BACON is given data to look at and the participant in the experiment has to generate it. The typical participant is a college student--who may or may not have a scientific or technical background.

Initially, experiments with the 2-4-6 task, as this number triple problem is called, were intended to investigate whether people could use a particular hypothesis-testing strategy favored by the philosopher-of-science Karl Popper (Popper, 1959). Popper emphasized that science progresses best if scientists propose bold, risky hypotheses that can potentially be falsified (Popper, 1963). Popper was not particularly interested in how scientists came up with the hypotheses; he focused more on the way in which the hypotheses were tested.

In other words, Popper supported the classic distinction philosophers make between discovery and justification (Reichenbach, 1938), which simply says that the way in which a hypothesis is discovered should have no effect on how it is evaluated. A scientist could have a dream, like Kekule is supposed to have done, and discover the structure of the benzene molecule. At the twenty-fifth anniversary of the publication of his discovery, Kekule gave the following account if it:

One fine summer evening, I was returning by the last omnibus. I fell into a reverie and lo, the atoms were gambolling before my eyes! Whenever hitherto these diminutive beings had appeared to me, they had always been in motion; but up to that time I had never been able to discern the nature of their motion. Now, however, I saw how, frequently, two smaller atoms united to form a pair; how a larger one embraced two smaller ones; how still larger ones kept hold of three or even four of the smaller; whilst the whole kept swirling in a giddy dance. I saw how the larger ones formed a chain, dragging the smaller ones after them, but only at the ends of the chain...The cry of the conductor: "Clapham Road," awakened me from my dreaming; but I spent part of the night in putting on paper at least sketches of these dream forms. This was the origin of structure theory (Schaffer, 1994, p. 23).

Here the voice of the muse comes to the hero who is prepared to listen, and he carries its words back to the world. This famous account is retrospective, and therefore may not be entirely accurate (Ericsson & Simon, 1984). Kekule also provided other accounts of his discovery, including one involving circling snakes that suggested the way the atoms might be linked. Kekule no doubt saw part of the solution in dreams and reveries, but the working out of the rest was time-consuming, difficult and involved frequent negotiations with others; these stories helped establish Kekule's priority and originality. Indeed, Kekule's address resulted, in part, from a deliberate effort by the organizers of the conference to establish him as a scientific hero (Schaffer, 1994). All of this is not to say that Kekule was lying. Human memory for complex events is reconstructive, and tends to reflect what we thing ought to have happened, not what actually happened (Neisser, 1982).

Popper would not have cared about this story and the negotiations in which it was embedded. What mattered was whether Kekule formulated a falsifiable hypothesis. One could argue with Popper that stories of this sort play a role in determining who gets credit for a discovery, but he still would not be interested. His concern was how theories ought to be justified.

Popper's favorite example of a falsifiable hypothesis was Einstein's General Theory of Relativity, which included a specific prediction about the curvature of light in a gravitational field. Eddington set out to test this prediction by measuring to what extent light from selected stars was attracted by the sun's gravitational field during an eclipse. When Einstein was asked what he would do if Eddington's results did not agree with his predictions, he said, "Then I would feel sorry for the dear Lord (Eddington). The theory is right" (Holton, 1973, pp. 234-5). Similarly, when initial experiment results appeared to contradict Special Relativity, Einstein was not alarmed--he pointed out that the rivals to Special Relativity were ad-hoc theories and called for more replication. So, Einstein himself was not a Popperian. Eddington characterized his eclipse results as providing support for Einstein's General Relativity, but there were contradictions and ambiguities in the data (Collins & Pinch, 1993).

How could one determine if falsification would lead to more scientific progress? One could adopt an in vivo approach, looking at instances where falsification was deliberately applied to scientific problems. The problem with this kind of study is that any number of other factors could have affected progress, or the lack thereof, in an actual case.

An alternative is to look at whether falsification was an effective strategy on tasks that simulate scientific reasoning. If it were not an effective strategy in vitro, under ideal conditions, that would cast doubt on the usefulness of the strategy in general (Gorman, 1992b). Why? Consider this simple 2-4-6 task. It is exactly the sort of problem on which falsification should be effective--it eliminates all the confounding factors like error in the data and pressures to publish that may interfere with falsification in scientific practice *(see Gorman, 1992). Abstract tasks create ideal, simple situations for exploring the heuristic value of the sorts of norms recommended by philosophers. Of course, just because a norm like falsification works in an ideal, abstract situation, there is no guarantee it will work in science. Conversely, if a norm fails to work even under the most ideal conditions, there are good reasons for doubting its effectiveness in a real-world situation.

Wason (Wason, 1960) initially found that participants did not falsify hypotheses like 'even numbers ascending by twos'--they proposed instances that agreed with that hypotheses, and no triples that should have been wrong if the hypothesis were right. A participant who proposed '1,2,3' would have been told it was correct. If '1,2,3' is an instance of the rule, the hypothesis 'even numbers ascending by twos' is false. (In fact, Wason's rule was 'ascending numbers'). Wason viewed this as evidence of a 'verification bias' on the part of his participants.

One advantage of in vitro studies is control. In Wason's case, he was able to create a task that controlled for participants' previous experiences--none had ever worked on this problem before--and set-up the problem in a way that required participants to falsify the obvious, initial pattern if they were to discover the actual rule. This set-up factor illustrates the other powerful advantage of in vitro studies; they allow one to manipulate the conditions under which participants try to solve problems.

A group of psychologists at Bowling Green State University took this idea of manipulation a step further. They gave some participants instructions to falsify on the 2,4,6 task, and others instructions to try to verify or confirm their hypotheses (Tweney, 1980). Participants were asked to indicate which of their triples were intended as confirmations or disconfirmations; sure enough, participants given the disconfirmatory instructions did make more attempts to falsify. But their efforts to falsify did not make their performance better than those participants who tried to confirm. Worse, this lack of effect for falsification was a replication of an earlier study at Bowling Green, in which participants shot particles at objects on a computer screen in order to determine what rules govern particle deflection (Mynatt, Doherty, & Tweney, 1977; Mynatt, Doherty, & Tweney, 1978).

This seemed like a surprising result to me. If falsification played an important role in scientific progress, it seemed to me that it ought to improve performance on an in vitro simulation of scientific reasoning. I decided to follow-up on this puzzling finding and tried my own version of such instructions on the 2-4-6 task and a related problem. I unwittingly made an important change in the original design. Whenever a participant made a guess about the rule, experimenters in previous studies had told them whether that guess was right or wrong. That amounted to a scientist's being able to ask God whether her rule was right. No need to falsify if you can find out in some other way.

So, participants in my experiment had to test their own hypotheses, and whenever they asked if they could guess the rule, I told them it was up to them to decide whether and when they knew they had solved the problem. Under these circumstances, instructions to falsify greatly improved subjects' ability to solve the 'ascending numbers' rule.

I appeared to be onto a minor discovery myself--that falsification was effective on at least two problems that simulated scientific reasoning (Gorman, 1992a). My disconfirmatory instructions emphasized trying triples that ought to be wrong if one's hypothesis were right. What I was doing was trying to teach participants a heuristic I thought would lead to falsification. A heuristic is a kind of 'rule of thumb' of the sort that experts use. Heuristics, unlike algorithms, do not guarantee results. Experts use them in situations where there is no algorithm.

An analysis by Klayman and Ha (Klayman, 1987) showed why my heuristic should have been successful, and why it wasn't the same as falsification. These two authors referred to strategies like my 'try to get triples wrong as 'negative test heuristics'. When the participant's hypothesis is contained within the target rule, this sort of heuristic is most likely to lead to success.

CH.2.FIG.5.GIF (1586 bytes)

Figure 5: H is the participant's hypothesis, T is the target rule. A negative test heuristic will focus the participant on the zone within T but outside of H.

Given that the rule was 'ascending numbers' and the initial triple suggested a hypothesis that was a sub-set of the actual rule, my 'try to get triples wrong' heuristic would be helpful in finding the actual rule--it would point participants to the outer 'T' ring in the above diagram.

However, if the problem space described by the hypothesis were broader than the target rule, a negative test heuristic would not be effective. Suppose one's hypothesis was 'numbers ascend by twos' and the actual rule was 'even numbers ascend by twos'. If one tried negative tests like "1,2,3' and '7,6,13", they would all be wrong, thereby confirming one's hypothesis.

CH.2.FIG.6.GIF (1649 bytes)

Figure 6: In this case, participants need to propose positive instances of H like '3,5,7' in order to find T.

The only way to disconfirm it would be to try a positive test like '3,5,7', which would put one in the part of H that is outside T. My confirmatory instructions urged participants to propose triples they thought would be correct. But in some situations, these instructions would be more likely to lead to falsification, because when the rule is narrower than the hypothesis, some of the triples one thinks should be correct will be incorrect. My confirmatory instructions were really encouraging what Klayman and Ha called a positive test heuristic, which they regard as a good, all-purpose strategy for achieving either confirmation or disconfirmation.

In summary, my experiments did not show that falsification works as a general strategy across a wide range of problems. I had only found evidence to support the idea that a negative test heuristic will falsify a hypothesis that is narrower than a target rule.

I falsified even this analysis in my next experiment. On an even more general rule, 'the three numbers must be different', I found that negative test instructions did not improve performance--participants were clearly trying to obtain negative evidence, but they did not know where to find it. Following Tweney et al. (1980), I changed the task from a search for a single rule which would determine which triples were right and wrong to a search for two rules arbitrarily labeled DAX and MED: the DAX rule was "the three numbers must be different" and the MED rule was "two or more numbers must be the same". As in Tweney et al.'s earlier study, this manipulation greatly improved performance, whereas simply giving subjects instructions to falsify did not.

I concluded that falsification depended at least in part on what Johnson-Laird (1983) has called a 'mental model' of the task. Subjects whose mental model was that they were trying to find a single rule with exceptions found little or no negative evidence. For example, participants who proposed the triple '0,0,0' and were told it was incorrect guessed rules like 'any number except zeroes'. In contrast, the DAX-MED instructions suggested a mental model involving a search for two complementary rules. Subjects who proposed the triple '0,0,0' in this situation realized this MED result was a clue to another rule, and pursued it by proposing other combinations in which two or more numbers were the same (for a recent series of experiments that supports this analysis, see (Wharton, Cheng, & Wickens, 1993)). These results suggested that the critical relationship in Klayman and Ha's analysis was between the subject's hypothesis and her representation of the target rule.

Farris and Revlin (1989; 1989a) argued that many subjects who appear to be trying to falsify are actually searching for positive instances of a counterfactual hypothesis. For example, a subject who thinks the rule is 'even numbers' may propose 'odd numbers' as a counterfactual hypothesis, then test that with a triple like '3,5,7' which is a negative test with respect to 'even numbers' but confirmatory with respect to the counterfactual hypothesis 'odd numbers'. A counterfactual heuristic may be a successful way of converting the standard version of the 2-4-6 task to a DAX-MED problem, because a counterfactual hypothesis is roughly equivalent to a hypothesis about the MED rule, and successful DAX-MED subjects pursue positive instances of the MED rule.

This kind of fine-grained analysis of hypothesis-testing highlights the strengths and weaknesses of in vitro studies. The in vitro work allows us to look at heuristics under highly controlled, artificial conditions, manipulating variables like the relationship between a participant's most likely representation of the task and the actual rule. This kind of manipulation and control is impossible in vivo. However, in vivo studies are needed to see if the in vitro results are ecologically valid, i.e., applicable to real-world situations.

2.3.1.2 Experts working on abstract tasks

There are almost no studies involving scientists trying to solve these abstract problems. Mahoney (1977) compared a small sample of scientists working on the traditional version of the 2-4-6 task to a sample of Protestant ministers and found that the former were less willing to abandon their hypotheses than the latter. Mahoney initially saw this as evidence of a confirmation bias on the part of scientists, but one could also argue that they were following a positive-test heuristic. If one is to make any conclusions about the abilities of working scientists to solve abstract problems, more research with different rules and procedures are needed. For example, scientists should be run in a condition where they know they cannot ask the experimenter at any time whether their hypotheses are right.

2.3.1.3 Adding the Possibility of Error to Abstract Tasks

One can gradually add realistic features to in vitro studies. One of the features that makes the 2,4,6 and related tasks so unrealistic is that every trial or mini-experiment produces results that are 100% reliable. In contrast, scientists are acutely aware of the possibility of error when they design and evaluate experiments. For example, Einstein's theory of special relativity was apparently falsified by the eminent physicist Kaufmann; Einstein himself remained undisturbed, however, and called for replication. Kaufmann's result was later found to be an error *(see Gorman, 1992).

In May of 1795, Joseph J. F. Lalande recorded a new star in two different positions over a three day period, and decided at least one, if not both of the observations were due to errors (Hoyt, 1980). This star was identified as the planet Neptune in 1846, and Lalande's original observations were used in computing its orbits. One scientist's error is another scientist's discovery.

To understand how error can be added to one of the problems that simulate scientific reasoning, let us once again use the 2-4-6 task as an example. In the usual version, every result is 100% reliable and unambiguous. I added the possibility of error by telling participants that anywhere from 0 to 20% of their results might be erroneous, i.e., a triple that was classified as incorrect might be correct and vice-versa. Error would occur at random, as determined by a random number generator on a calculator.

I thought that this possibility of error might make it easier for individuals to engage in confirmation bias. A recent example is the cult Heaven's Gate, which was certain that an alien spaceship accompanied comet Hale-Bopp.

In January of 1997 several cultists, including their leader Applewhite, bought a computerized telescope with a 10-inch mirror. They used it to look at Comet Hale-Bopp, and search for the "companion object." They were following a scientific impulse--seeking direct observation of the vehicle that would rescue them from our doomed planet.

They saw the comet perfectly. They saw no spaceship.

And then they returned the telescope to the store and asked for their money back (Achenbach, 1997,F4).

This is a classic use of what Doherty & Tweney (Doherty, 1988) called System-Failure (SF) Error to immunize a hypothesis against falsification. If you don't like the evidence, blame the instrument. There was a spaceship--there had to be. The telescope wasn't working.

The saddest part of the story is that the group killed themselves in the belief that they had to leave their 'vehicles' (bodies) before they could be taken away by the spaceship accompanying the comet. By all accounts, they died with smiles on their faces, certain of the resurrection.

One scientific analogue of this kind of error is an experimental result which appears to confirm a hypothesis but actually disconfirms it and vice-versa. The controversy between Millikan and Ehrenhaft over the charge on the electron serves to illustrate. R.A. Millikan presupposed a unitary charge; in his famous oil-drop experiment, he discarded results that appeared to suggest a fractional charge. But these results, if true, would have supported the theory of his competitor, Felix Ehrenhaft. "If Ehrenhaft had had access to Millikan's notebook, he would have found precisely those runs most valuable for his purposes, which, for Millikan, were failed" (Holton, 1986, p. 12). Having a mental model of the kind of rule one is looking for helps one identify and discard errors.

But suppose one has a Heaven's Gate mental model, totally out of whack with reality? One way to check whether an apparent disconfirmation is an error is to tighten procedures. Another is replication. In the Heaven's Gate case, the group might have tried a larger telescope, and observed over a long period of time. In Millikan's case, he and his technician refined their technique until they could produce the desired effect more reliably; his notebooks record 'beautiful' results more frequently later in his series of experiments, though there are still errors *(Holton, 1978, p. 71).

In Millikan's case, replication led to confirmation. It can also lead to falsification. Walter Alvarez recounted the day when he and his father thought they had discovered evidence that a supernova caused the extinction of the dinosaurs. The key empirical support for this hypothesis came from the presence of plutonium-244 in the KT boundary which marks the end of the Cretaceous and of the dinosaurs--a period of mass extinction. After an exhausting night of taking samples, two geochemists concluded that there was plutonium-244 in a sample of soil from the KT boundary--an apparent confirmation of the hypothesis. Luis Alvarez was ready to announce the discovery, but Walter tried the result and procedures on the Deputy Director of the Lawrence Berkeley Laboratory, who advised them to, "Do it all over again. Repeat every single step from the very beginning, on a fresh sample, to be absolutely sure there really is plutonium-244 in that clay" ((Alvarez, 1997, p. 74). They ran the whole set of procedures on a second sample and found no trace of plutonium-244. The heuristic in this domain, where the procedures are so difficult, is to trust the negative result. Replication had turned into falsification.

I wanted to simulate the effect of error on scientific reasoning in vitro, in order to find out how specific variables affected it. In my first series of experiments, I focused on the possibility of error by setting the error rate at 0. In other words, participants were told that as many of 20% of their results might be errors, but encountered no actual errors. Participants had to figure this out. Most used a heuristic I called 'replication plus extension', proposing triples that were similar to, but not exactly the same as, previous triples in an effort to replicate the current pattern and extend it slightly, e.g., following '2,4,6' by '4,6,8'. This looked much like the positive test heuristic recommended by Klayman and Ha; the difference is the goal--in addition to trying to confirm a hypothesis with positive tests, participants were trying to check for errors. Participants given possible-error instructions had to propose twice as many triple, but managed to discover Wason's rule as often as participants in a control condition.

But in an earlier study using the card game Eleusis, I had discovered that the possibility of error greatly interfered with subjects' abilities to solve a simple rule. One difference between the two tasks is that the cost of replication in Eleusis was much higher. One had to replicate not only a single card, but a sequence of cards. I experimented with giving subjects on the 2-4-6 task a similar rule, presenting it in a format that gave them results of previous trials. To get a feel for the task, try to do what participants did, and write down any guesses that you might have about a rule that could govern all five triples. Would your rule be any different if I told you it was possible one of these five results was an error, i.e., if it is a Y, it should be an N and vice-versa?

Triple Conforms to Rule
1,2,3 Y
4,5,6 Y
4,5,6 N
5,10,15 Y
10,20,30 N

The key problem is what to do with the fact that 4,5,6 is right once, then wrong. Depending on one's hypothesis, one can label either of them an error. Then I gave participants five more triples. Consider whether these change your first hypothesis.

Triple Conforms to Rule
10,33,12 Y
13,20,5 Y
14,9,14 Y
12,35,14 N
15,15,6 N

Let's consider an example. Suppose you hypothesized that the rule was odd and evens alternate within each triple. If you covered the '4,5,6 N' in the first set of triples and the '12, 35, 14 N' in the second, the triples would fit this hypothesis. This is akin to looking carefully at results of previous experiments in a scientific domain, and using the current hypothesis or paradigm to decide which were likely to be errors.

I allowed participants to propose as many as five additional triples of their own, which meant they had the opportunity to replicate. In most actual scientific situations, one does not have unlimited resources to devote to replication; therefore, I thought this five-triple limit was more realistic than unlimited triples.

In fact, there was no actual error. The rule was that numbers had to alternate odd and even across as well as within each triple. This made this task more like the earlier one I had used with cards: to replicate, participants had to repeat not just one triple, but a sequence of triples, and at the same time test their hypotheses. In a possible error condition, participants solved the rule only 15% of the time. In contrast, 50% of participants who were not told about any possibility of error solved the rule.

I tried a couple of in vitro simulations using actual error and the 2,4,6 task. Changing the amount error from zero to 20% greatly interfered with participants ability to discover Wason's original (Gorman, 1989(c)). Some of these 20%-error participants made repeated attempts to replicate and located many of the errors, but because of this, they were not able to adequately test the generality of their hypotheses and ended-up with rules like 'numbers must go up by twos'. Others simply used errors to immunize their hypotheses from disconfirmation. As one participant said, "I assigned errors to the triples I did because they did not fit my hypothesis" (Gorman, 1989(c), p.409).

These finding illustrates that, even on very simple artificial tasks, replication alone is not sufficient to isolate and eliminate errors. Collins (Collins, 1985) has discussed how difficult it is to replicate a result. Obviously, scientists rely on other kinds of checks in addition to replication, e.g., refinement of procedures. But these simple experiments demonstrate the way in which hypotheses are often used to identify errors, and the importance of replication. In contrast, my experience suggested that psychology journals were often unwilling to publish replications *(Gorman, 1992).

2.3.1.4 Lessons Learned from Abstract Tasks

Hopefully, the description above of my own research using abstract tasks and novice participants will give the reader a sense of the pros and cons of in vitro experiments. The strength of these sorts of experiments is that one can set up a task in a particular way to assess how it will affect performance. For example, one can isolate the effect of the mere possibility of error and study them under carefully controlled conditions. The weakness is that one cannot be certain how these results will generalize to more complex situations involving multiple types of error. However, in vitro results can give us issues to focus on in vivo.

Even on these highly abstract tasks, discovery depends both on problem representation and on the strategy one uses to tackle it. I referred to the representations as mental models because of the breadth of this term--it can be used to describe how we solve syllogisms (Johnson-Laird, 1983), how we imagine the workings of a calculator, computer or VCR (Norman, 1993) and, as we saw in the first chapter, what form we think a rule or law might take. Consider Kepler--his initial mental model of a rule for orbits involved perfect circles; he abandoned this rule only when he was forced by negative evidence. Unlike my 2-4-6 subjects, he did not have to generate this negative evidence himself; instead, it was given to him by Brahe.

I refer to the strategies as heuristics because a heuristic is a kind of 'rule of thumb' that works sometimes and doesn't others. If your goal is to test a hypothesis, you can and should employ a number of strategies, depending on how you represent the problem: you might try a positive or a negative heuristic, or a counterfactual heuristic, or some combination.

Mental models are also used to discriminate erroneous data from valid results. There need to be other checks as well, like replication-plus-extension. But scientists need mental models to target probable sources of error. Millikan's mental model suggested that all results which did not indicate a unitary charge for the electron should be carefully scrutinized and replicated.

2.3.2 Tasks That Have the Look and Feel of Scientific Problems

The distinction between abstract tasks and tasks that simulate scientific problems is somewhat fuzzy. Basically, the former refer to tasks that have no content which resembles the sorts of problems encountered in science, whereas the latter contain some content. My modifications to the 2,4,6 task to accommodate error fall into a fuzzy area; the task itself is highly abstract, but by the time one adds a review of literature, limits on replication and the possibility of error, one has a task that bears a closer resemblance to at least some scientific problems. In the next section, I will describe several tasks that have more of the 'look and feel' of actual scientific problems.

2.3.2.1 Novices

A group at Bowling Green State University (Mynatt, Doherty, & Tweney, 1977; Mynatt, Doherty, & Tweney, 1978) developed an artificial universe that required participants to discover the rules governing the motion of particles in a universe of shapes. In the most difficult version of this task, participants spent about ten hours firing particles at different arrangements of shapes. None of them discovered the rule. The participant concentrated on developing a hypothesis and trying to confirm it. In contrast, participants that focused on disconfirmation rejected promising ideas too quickly. Mynatt, Doherty and Tweney concluded that confirmation was an effective heuristic early in the inference process; once a subject or scientist had discovered and verified a pattern, then she could switch to the search for disconfirmatory evidence. This heuristic combination of confirmation and disconfirmation also worked on abstract problems like the 2-4-6 task, especially when the possibility of error was added. But the heuristic value of 'confirm early, disconfirm late' became most apparent on a task that simulated the complexity of actual science.

Kevin Dunbar (1989) created a computerized molecular genetics laboratory in which subjects were posed a problem similar to the one for which Monod and Jacob won the Nobel Prize in 1961. Dunbar did not intend to have subjects simulate the actual discovery path followed by Monod & Jacob; instead, he wanted "to use a task that involves some real scientific concepts and experimentation to address the cognitive components of the scientific discovery process." (Dunbar, 1989, p. 427).

Participants were given elementary training in concepts of molecular genetics, using an interactive environment on a Macintosh computer. Then they were allowed to perform experiments with three controller and three enzyme-producing genes; they could vary the amount of nutrient, remove genes, and measure the enzyme output. The mechanism the subjects had to discover was inhibition, whereas the mechanism they had learned in training was activation.

Dunbar used this task to make the argument that, "rather than inventing an arbitrary task that embodies certain aspects of science it is possible to give subjects a real scientific task to work with" *(Dunbar, 1987, p. 427). Hence, we use this problem as an example of a task that simulates an actual scientific problem.

But even so, the similarities between Dunbar's molecular genetics problem and the 2-4-6 task outweigh their differences. Participants on both are given instructions which explain their little universe; these instructions, like the starting triple 2-4-6, bias them towards a hypothesis that is different from the one they are trying to find, and they are able to do a wide variety of mini-experiments to discover the rule--which, although it represents an actual scientific relationship, is as arbitrary to them as the numerical formulas discovered by participants in the 2-4-6 task. There are none of the potential sources of error that occur in actual genetics experiments and no new techniques to be mastered.

Dunbar relates his findings to the literature on disconfirmation. In this task, all subjects eventually disconfirmed their initial hypotheses about the role of the activator gene--no matter what genes were present or absent, there was always an output. What is interesting is what they did next: 6 groups re-interpreted activation to mean a search for the gene that facilitated enzyme production, 7 searched diligently for an activator gene and eventually gave up, and 7 set the goal of explaining their surprising results. Five out of the 7 groups in this category actually found the inhibitor gene. Dunbar's results support the thesis that successful disconfirmation depends on how subjects or scientists represent the task.

Mynatt et al.'s artificial universe and Dunbar's molecular genetics simulation are not the only tasks that simulate scientific reasoning, but they are two of the best and most-cited, and give the flavor of the results one obtains. One other oft-used and cited task is the Big Trak problem, developed by Jeff Shrager (Shrager & Klahr, 1986). Because it involves learning how to run a device, it is not deliberately modeled after a scientific problem, but it is a discovery task.

In the typical version of this task, participants are asked to figure out the function of the RPT key on the back of a programmable vehicle. Let us consider a shortened account of the behavior of one participant, by way of example:

ML began with the hypothesis that RPT N would repeat the entire program N times. So he programmed it to go forward two spaces, then repeat that twice. The result was Big Trak went forward 4 spaces, instead of the predicted 6.

ML had now disconfirmed his initial hypothesis, so he revised it--RPT N repeated only the last step N times. So he programmed Big Trak to go forward 2, left 30, then RPT 1. Big Trak went forward 2 and left 60, confirming ML's hypothesis. Then he ran the same forward 2, left 30 sequence with RPT 2; instead of going left 60 as he expected, Big Trak repeated the forward 2, left 30 sequence twice. Note that ML has conducted a positive test and has gotten a disconfirmatory result. He replicated the whole sequence to make certain. Then he revised his hypothesis: RPT N meant repeat the N steps before the RPT instruction. He then tested it with varying lengths of N, making sure he understood how RPT selected the steps.

Like ML, most participants began with the idea that an instruction like RPT 4 meant 'repeat whatever program had been typed in four times' or 'repeat the last step in the program four times'. Typically, they began with positive tests and quickly obtained disconfirmatory information, though most were not as efficient as ML. In order to discover the rule, subjects had to change their representation of the role of the repeat key: it selected the step to be repeated, so that 'RPT 4' meant 'repeat step 4'. Subjects had to realize that the RPT key might serve as a selector, indicating which lines were to be repeated, instead of a counter, indicating the number of times something was to be repeated. The shift from a counter to a selector mental model directed subjects to a different part of the problem space to search for confirmations and disconfirmations. Similarly, the DAX-MED manipulation transformed participants' mental models of the 2-4-6 task from a search for one rule with exceptions to a search for two mutually-exclusive rules.

Klahr and Dunbar (Klahr, 1988) discussed the way in which participants switched between searching two problem spaces, one of which was a space of possible hypotheses and the other of which was a space of possible experiments. ML first considered a set of hypotheses that depended on the idea that RPT was a counter; he generated a space of possible experiments based on that mental model. When results violated expectations, at one point he switched to searching for a new kind of hypothesis, in which RPT selected the steps to be repeated. Disconfirmation can lead to a change in the type of hypothesis one is pursuing, which in turn directs one to search different parts of the experiment space.

Klahr and Dunbar concluded that their participants showed two different cognitive styles: Theorists and Experimenters. The former, when presented with disconfirmatory results, searched the hypothesis space for alternatives that would fit the evidence and also make interesting new predictions. ML did this when he thought about why Big Trak repeated the forward 2 left 30 sequence twice in response to RPT 2. The latter responded to disconfirmatory evidence by exploring the experiment space--at some point, most of them ran experiments which made the selector role of RPT salient. Theorists conducted about half as many experiments as Experimenters, and almost all of the former's experiments were guided by a hypothesis, whereas the latter's were often simply exploratory. IN a second study, Klahr and Dunbar found that participants with prior programming experience could discover the function of the RPT key by searching the hypothesis space, then conducting tests in the experiment space.

In a more recent study using a version of their RPT task, Klahr, Fay and Dunbar (1993) established that third and to a lesser extent sixth graders had trouble with evidence that disconfirmed counter hypotheses, in part because they could not switch to a selector hypothesis: "inconsistencies were interpreted not as disconfirmations, but rather as either errors or temporary failures to demonstrate the desired effect." (p. 140). Klahr, Fay and Dunbar interpreted this as a failure to coordinate searches in hypothesis and experiment spaces, a view we will explore in greater depth when we consider the performance of children on actual scientific problems.

2.3.2.2 Conclusions from Tasks That Simulate Scientific Problems

Despite Dunbar's arguments about the importance of modeling tasks after real scientific problems, the conclusions from tasks that have the look and feel of scientific problems look little different from those derived from abstract tasks. What one learns is more about the relationship between mental models, hypotheses and experiments in a variety of domains that resemble aspects of science.

Type of Task:

Participants:

Abstract

Simulated Scientific Problem

Scientific Problem

Novices

Effectiveness of heuristics like

1. positive test

2. counterfactual

3. replication-plus- extension

depends on relationship of mental model to target rule.

Demonstrate importance of additional heuristics:

confirm early, disconfirm late;

coordinate search in two spaces

 

Children

  Are unable to coordinate search in two spaces  

Experts

Prefer a positive test heuristic

   

2.3.3 Actual Scientific Problems

Another way to study scientific thinking is to use actual scientific problems. On these problems, it is harder to manipulate features of the task like whether it requires background knowledge or can be done by anyone walking in 'cold', whether the rule is narrower or broader than the participant's most likely initial hypothesis, or indeed whether there is any rule at all, and how the problem space is structured. Such problems do allow us to stody differences in the way experts represent tasks in their domain, and what heuristics and algorithms they use.

2.3.3.1 Novices

Researchers like McCloskey (1983), Clement (1982) and Carey (Carey, 1992; Wiser, 1983) have established parallels between the mental models of modern novices and historical figures in the evolution of science. For example, McCloskey (McCloskey, 1983) found that college students held beliefs about physics that resembled those of Philoponus (6th century) and Buridan (14th century), who thought that a force was required to set a body in motion, and that the force gradually dissipated. Clement (Clement, 1983) found that freshman engineering students were a little more advanced: protocols of their attempts to solve motion problems resembled Galileo's reasoning in De Motu..

Brewer and Chinn (1991) studied how such beliefs change. They gave adult novices brief readings on quantum theory or special relativity and asking them a series of follow-up questions. Both quantum theory and relativity make predictions that conflict with common-sense beliefs about space and time and cause and effect. Some subjects simply rejected the new information, resembling those scientists who cling to the old paradigm. Other subjects showed at least partial assimilation of the new material: they were able to give an answer that corresponded to what they had read, but they "sure didn't believe it." (p. 70) Another move was to interpret the answer in terms of existing beliefs, for example, by treating relativistic phenomena as optical illusions.

2.3.3.2 Children as Novices

Jean Piaget argued that the development of scientific thought in the child recapitulated the evolution of science (Bringuier, 1980). Studies that show how the scientific beliefs of children and novices change owe much to Piaget's inspiration. This line of work is also influenced by Thomas Kuhn's (Kuhn, 1962) view that long periods of normal science are followed by crises caused by anomalies in the reigning paradigm. A paradigm corresponds to something like a collective mental model--a good example is the circular orbit model that was almost universally accepted before Kepler. Brahe's anomalous data sparked a crisis, which Kepler resolved by proposing his new model of the solar system.

From a Kuhnian standpoint, the mental models held by practitioners before and after a paradigm shift are incommensurable--those holding the older view cannot even understand the new one. Kuhn's views are by no means accepted by all or even most historians and philosophers, but they are extremely influential. If Piaget and Kuhn are right, children and novices should go through revolutionary shifts in mental models as they learn scientific concepts. For example, Chi (1992) used a Kuhnian framework to review the literature on conceptual changes in children and adults. She argues that radical conceptual change often occurs before anomaly recognition, whereas most of the hypothesis-testing literature tends to take anomaly recognition for granted--except under error conditions, it is clear when a triple is at variance with a hypothesis. Her own analysis suggests that recognition and resolution of anomalies requires a shift to a new system of categories similar to the kind of paradigm shift made famous by Kuhn.

Similarly, Carey (1992) compared the problems children ages 3 to 5 have differentiating weight and density with the problem scientists before Black had differentiating heat and temperature: in both cases, the view before differentiation seems to belong to a different, incommensurable paradigm from the view afterward. Carey is therefore sympathetic to Kuhn's views, but less to those of Piaget, who proposed major changes in the cognitive abilities of children as they passed from one stage of development to another. Carey finds changes in conceptual content in specific domains as the child grows older, not general changes in cognitive ability.

Brewer and Samarapungavan (1991) concluded "that the child can be thought of as a novice scientist, who adopts a rational approach to dealing with the physical world, but lacks the knowledge of the physical world and experimental methodology accumulated by the institution of science" (p. 210). Like Carey, they argue that the apparent differences in thinking between children and adults is due to differences in knowledge, not the ability to employ reasoning strategies. For example, they studied second-graders and showed that those who had a flat-earth mental model could incorporate disconfirmatory information consistent with a Copernican view by transforming their model into a hollow sphere. They used this new mental model to solve a range of problems about the day/night cycle and motion of individuals and objects across the earth' surface *(see Vosniadou and Brewer, In Press).

In contrast, Deanna Kuhn (Kuhn, 1989) argued against the 'child as novice scientist' view. "Both child and scientists gain understanding of the world through construction and revision of mental models. Recent research....suggests that the process in terms of which mental models, or theories, are coordinated with new evidence is significantly different in the child, the lay adult, and the scientist....In some very basic respects, children (and many adults) do not behave like scientists" (Kuhn, 1989, p. 687).

D. Kuhn, following Klahr and Dunbar (Klahr, 1988), felt that it was important to distinguish between two problem spaces that have to be coordinated when one is solving scientific problems. One is a space of possible hypotheses, the other is a space of possible experimental or observational results that might bear on the hypotheses. According to D. Kuhn, in the child, experiment and hypothesis spaces are merged into a single mental model, without any clear distinction between the two. In the scientist, theory and evidence are clearly separated. The novice adult falls somewhere between.

In her research, D. Kuhn focused on theory revision in the light of evidence. One of her studies involved hypotheses about the relationship of food and colds. She cites one child who believed that relish caused colds and candy bars did not. This child was presented with instances whose overall pattern showed neither variable made any difference, but instead she picked out individual results that supported her theory, singling out positive tests for relish and negative for candy bars and ignoring the rest.

This process can occur in adults, too, and have enormous significance. The Dow Corning company has been forced into Chapter 11 because of litigation regarding its silicone breast implants. Nikr Kossovsky is often called as an expert witness in these trials. He ran a standard, but very difficult, test for antibodies and found that "scores of 9 of his 249 women with implants were significantly higher than the mean score of the 47 healthy women or of the 39 women with autoimmune disorders. But those 9 women represented less than 4 percent of all the women with implants he tested. What if in reality his...test was meaningless? Then he might expect 4 percent of all women to score equally high. Because his two comparison groups had comparatively few women, 4 percent of those would be fewer than two from each group. With numbers this small, it is not particularly surprising that he got zero" instances of higher scores from his comparison groups (Taubes, December, 1995, p. 71).

Like children in D. Kuhn's study, Kossovsky singled-out a few positive results without taking into account the overall pattern. These kinds of biases can have multi-million dollar consequences.

Overall, D. Kuhn found that adults were better than children at conducting coordinated searches of hypothesis and evidence spaces on tasks where this sort of financial incentive was absent. Scientists were even better. The key, according to D. Kuhn, is the development of metacognitive skills that permit delineation of theory and evidence, and a coordinated search in two spaces. Metacognition involves being aware of one's own cognitive processes, and modifying them when necessary. In this case, metacognition involves being aware that a mental model is just that--a working model that may have to be modified in the light of evidence.

Similarly, Klahr, Fay and Dunbar (1993) found that adults performed better than children on tasks that simulate scientific problems because the adults possessed "a set of domain-general skills that go beyond the logic of confirmation and disconfirmation and deal with the coordination of search in two spaces" (p. 141). A coordinated dual space search facilitates shifts in representation that lead to new mental models.

Carey and Brewer feel that development of scientific knowledge has more to do with changes in domain-specific knowledge, whereas D. Kuhn and Klahr place more emphasis on heuristics and metacognitive abilbities.. This debate has important implications for teaching discovery. Does one promote discovery simply by teaching the content of a domain, or does one encourage the development of metacognition and heuristics like dual-space searches? The obvious answer is to do both.

Interestingly, D. Kuhn's research has focused more on situations where the relationships between variables are less than perfect--where one needs to look at the overall pattern of positive and negative results. Vosniadou and Brewer, in contrast, preferred to help children clarify their mental models by pointing out inconsistencies and places that needed elaboration. For example, they shoed children who said the Earth was round a picure of a house and asked them questions like , " This house is on the earth, isn't it? How come here the earth is flat, but before you made it round?" (Vosniadou & Brewer, 1992, ). Children were able to modify their mental models to accommodate this sort of contradiction. One solution some adopted was to visualize the earth as a kind of flattened sphere, a kind of thick pancake.

In other words, Brewer's children don't have to conduct a search in two spaces--they are given results from the evidence space. They are able to use this evidence to modify their hypotheses. Similarly, in Brewer's adult study mentioned above, results from the evidence space were summarized for participants in a way that highlighted the contradiction between their mental model and the result.

The point here is a reflexive one: how you set up the experimental task determines, in part, your results. It is much the same with computational simulations.

D. Kuhn and Brewer could still be surprised by what they found, just as computational simulations can surprise their creators. But there is a difference between studying how children deal with less that perfect covariation between variables (D. Kuhn) and how they deal with what Thagard calls explanatory coherence (Brewer). Both kinds of study are valuable, and well-conducted. In the former, children appear to lack abilities characteristic of adult scientists, and in the latter, they appear to possess them. The obvious compromise is to try to determine the kinds of tasks and situations on which the performance of children and novices will resemble those of experts, and the tasks on which they will not. One could, for example, take the same set of children and run them through both co-variation and explanatory tasks and compare their performance to scientists confronted with the same type of problems. Intriguingly, Faust (Faust, 1984) suggests that even experts often do poorly with co-variation problems.

2.3.3.3. Experts and Novices Compared

In the previous section, we compared children and adults. In this one, we will talk about how children and adult novices compare with experts. "For a long time the study of exceptional and expert performance has been considered outside the scope of general psychology because such performance has been attributed to innate characteristics possessed by outstanding individuals. A better explanation is that expert performance reflects extreme adaptations, accomplished through life-long effort, to demands in restricted, well-defined domains (Ericsson & Charness, 1994, p.744). Expert knowledge needs to be more than a 'pile of facts'--it needs to be structured in ways that facilitate problem-solving (Ericsson & Charness, 1994).

Larkin argued that this knowledge is organized in sets of condition-action pairs known as productions, similar to the production rules used by the various forms of BACON, which were activated by patterns in the data (Larkin, McDermott, Simon, & Simon, 1980). She and her colleagues found that when an expert physicist encountered a familiar problem, the initial information typically triggered a set of productions which rapidly produced the correct equations--the expert had automated much of the problem-solving process, and worked forward from the information given. Novice physics students had to struggle backwards from the unknown solution, trying to find the right equations and quantities; they therefore took much longer even when they were able to find the correct result.

Consider the following example. Suppose we have to find the value of the friction coefficient for a block resting on an inclined plane. The initial problem statement gives the weight of the block, the angle of the plane and the force pushing against the block. The expert will work forward from the givens, generating the necessary equations to solve for the friction coefficient. The novice will typically start from the goal, generating the final equation, and trying to find values for the variables in that final equation by generating other equations that use the data given at the beginning of the problem to solve for each. When all variables have values, the novice stops (Anzai, 1991).

Working forward and working backward are examples of general, or 'weak' heuristics that can be applied across a wide range of problem-solving situations. Note that either heuristic can work, but working forward is typically faster and more efficient--the steps in the problem can be laid out systematically. Novices tended to try to apply equations early, whereas experts reason qualitatively until they arrive at a representation that suggests what set of equations to use (Larkin, 1983).

This finding suggests that expert/novice differences in heuristics are related to differences in mental models. Chi et al. (Chi, Feltovich, and Glaser, 1981) asked experts and novices to group physics problems based on their similarity, where the definition of similarity was determined by the participant. They found that "that experts tended to categorize problems into types that are defined by the major physics principles that will be used in solution, whereas novices tend to categorize them into types as defined by the entities contained in the problem statement" (p. 150). In other words, for experts, categorization is a first step towards solution.

Experts tend to classify problems as having to do with principles like 'conservation of momentum', whereas novices tend to do a more common-sense reading of the words and diagrams in a problem. Expert physicsits also generate diagrams that are "principle-oriented abstractions of physical objects" (Anzai, 1991, p. 88). whereas novices tend to rely more on diagrams that look like concrete objects

In a discussion of the way Galileo transformed the motion of a pendulum into an abstract, representation, Michael R. Matthews gives us a good description of these expert representations:

Planets and falling apples have color, texture, irregular surfaces, heat, solidity and any number of other properties and relations. But when they become the subject matter of mechanics they are merely point masses with specified accelerations; when thus conceptualized and delimited, they are no longer natural objects, but theoretical objects. In a similar way, when apples are considered by economists they become theoretical objects of a different sort--commodities with specific exchange values. When botanists consider apples they create yet other theoretical objects. For Galileo a sphere of lead on the end of a length of rope swinging in air, when it is considered by his mechanical theory, becomes a pendulum conceived as a point mass at the end of a weightless chord suspended from a frictionless fulcrum moving in a void (Matthews, 1994, p. 125).

Galileo solved the pendulum problem by abstracting it in the way suggested by the last line of the quotation, much to the frustration of his former mentor and leading critic, del Monte, who protested that actual pendulums did not behave in the way predicted by Galileo. Galileo countered by pointing out the way in which the actual pendulums failed to attain the ideal, frictionless state he was describing. Like modern novices struggling to attain the predicted result in a science lab, del Monte found that it is hard to make reality conform to the abstract representation.

Bucciarelli (Bucciarelli, 1994) includes a detailed analysis of the transformations a student has to be able to make in order to solve a textbook design problem. The student sees a picture of a hydraulic cylinder moving up and down through a slot and is asked to determine the numerical value of several variables at a particular instant in its motion. Like Galileo, the student has to turn a concrete picture into an abstract one, although in the student's case, even the concrete picture is covered with mathematical terms and values (see Figure 7).

CH.2.FIG.7.GIF (13998 bytes)

Figure 7: Diagram accompanying a problem concerning a hydraulic cylinder (Bucciarelli, 1994, Fig 6, p. 99)

The student must transform this picture into an even more abstract representation:

CH.2.FIG.8.GIF (13223 bytes)

Figure 8: A more abstract representation of the problem in Figure 7 (Bucciarelli, 1994, Fig 8, p. 106).

The transformation reveals the underlying form of the exercise. It is a 'vectore differential calculus' problem--abstract, universal and unencumbered. There is nothing left of the mechanism save its essence...no longer any pretense of machinery, hydraulic cylinders, piston rods, slotted arms, or frictionless pins. All of that is irrelevant. The student must learn to perceive the world of mechanisms and machinery as embodying mathematical and physical principle alone, must in effect learn to not see what is there but irrelevant. (Bucciarelli, 1994, p. 107).

Bucciarelli shows the kind of transformations novices must learn to make before they can solve familiar textbook problems. Subjects in these expert-novice comparisons typically work on such textbook-style word problems, not hands-on laboratory tasks. Therefore, findings from the expert-novice literature are especially relevant to educational situations (Reif and Larkin, 1991) but may have less relevance to scientific practice. Green (Green and Gilhooly, 1992) argued that "the standard expert-novice contrastive paradigm by requiring use of problems accessible to novices has led to a relative neglect of how experts tackle difficult problems and how experts detect and recover from errors in the face of task difficulty" (p. 67).

Similarly, Anzai pointed out that, "most of the recent cognitive research on physics has been limited to 'routine' problem-solving by experts and novices. That is, the primary concern has been with the simple problems often seen in high school or college textbooks. Although such routine problems are the real problems with which engineers deal, experts at the frontiers of physics are trying to discover the unknown principles of the physical world and to construct new types of representations that will help explain it in scientific terms. Only those who succeed in generating novel representations will be long remembered in the history of physics..." (Anzai, 1991). In other words, textbook problems correspond to what Thomas Kuhn called normal science; indeed, he argued that students learned the dominant mental models in their fields from textbooks. Revolutionary science lies beyond the textbooks, and it can take some time before new discoveries are integrated into textbook knowledge for a new generation of students.

Klahr, Fay and Dunbar (1993) point out that the expert-novice studies cited above do not give children or adults the opportunity to design new experiments and formulate and evaluate hypotheses, whereas experiments with simulations like the 2-4-6 task do. In a study using a task that permitted children and adults to generate experiments and hypotheses, Klahr, Fay and Dunbar (1993) found that superior adult performance "appears to come from a set of domain-general skills that go beyond the logic of confirmation and disconfirmation and deal with the coordination of search in two spaces" (p. 141). Klahr and D. Kuhn therefore agree on the importance of metacognition. It is not enough to know a lot of information, not even enough to be able to form abstract representations--to discover, one must be able to mount a coordinated search for new hypotheses and evidence that bears on them.

Type of task:

Participants: Abstract Simulated Scientific Problem Scientific Problem
Novices Effectiveness of heuristics like

.positive test

.counterfactual

.replication-plus- extension

depends on relationship of mental model to target rule.

Demonstrate importance of additional heuristics:

confirm early, disconfirm late;

coordinate search in two spaces

Use common-sense representations and

weak heuristics.

Children   Unable to coordinate search in two spaces Can modify mental models to achieve explanatory coherence
Experts Prefer a positive test heuristic   Are capable of abstract representations,

domain-specific heuristics and metacognitive coordination of dual-space search.

Both Klahr and D. Kuhn relied on abstract tasks and tasks that simulated scientific problems. It is hard to know how one could design an experiment in which experts and novices were put in a real discovery situation and their performance compared. There are two alternatives:

1) It is possible to design tasks based on historical discoveries, and see how novices fare when faced with the problems confronted by a Kepler, Faraday or Darwin. We will explore this possibility in Chapter 5, when we talk about active learning modules based on discoveries and inventions.

2) Participant observation, in which a novice enrolls as a member of a laboratory team and studies its processes as she or he learns them. For example, Bruno Latour enrolled as a technician in Jonas Salk's laboratory and studied it from that perspective (Latour, 1986). Unfortunately, Latour and his colleague, Steve Woolgar, were concerned about adopting the belief system of the 'natives' in this case, and so they deliberately avoided a deep study of the knowledge being transmitted, and instead focused on activities. Latour and Woolgar correctly assumed that knowledge was in part the product of social negotiations and activities, but they were not comfortable studying anything but the inscriptions and dialogue that resulted from such negotiations--they did not want to infer representations like mental models. So it is hard to compare their valuable and interesting work, and the work of most other participant-observers, with the cognitive work on expert/novice differences.

2.3.3.4 Different Levels of Expertise in Teams

In Chapter One, we described several cognitive case studies of expert scientists who made significant discoveries. There is one contemporary study from a cognitive perspective of scientific experts working in teams on a scientific problem. Because not all members of the teams studied are at the same level of expertise, it gives us a chance to compare different gradations within the expert classification.

Dunbar (1995; 1997) has conducted a major in vivo study of four molecular biology laboratories, recording laboratory meetings and conducting follow-up interviews. He emphasized the extent to which cognition was shared in these laboratories. He witnessed an actual case of scientific discovery that occurred in a laboratory meeting, where a surprising finding triggered a new model of a disease process. Dunbar was not able to single out an individual discoverer; instead, the new model emerged from a group process (Dunbar, 1997).

This discovery illustrates the way in which scientists tend to follow-up on surprising results, even ones that appear to disconfirm the current hypothesis. When confronted with a disconfirmatory result, the scientists typically did one of four things:

a) Ignored it, but only if it came early in the research project and had implications only for a corollary hypothesis, not the core hypothesis in the area (Dunbar, 1997).

b) Changed a corollary assumption of the core hypothesis: "For example, a scientist changed his hypothesis from 'this particular sequence is necessary to initiate binding of the protein' to 'any sequence in this region that has a base-pair mismatch will be bound to by this protein'" (Dunbar, 1995, p. 379). This kind of change preserves the core hypothesis *(see Gorman, 1992, for a discussion).

c) Attributed an anomalous result to error: In some cases, the evidence disconfirmed any hypothesis of the type currently held by the scientist. (Dunbar failed to provide an example). Individual scientists in this situation made another classic hypothesis-preservation move *(again, see Gorman, 1992): they attributed the result to an error.

d) Used the surprise as the basis for coming up with a new hypothesis. The above discovery is an example: a post doc came in excited about a surprising result that did not fit with existing theory, and members of the lab worked together to come up with a new model. On the role of serendipity, Dunbar concluded,

In the data we have collected, the scientist usually is looking for the desired results in the experimental conditions, and to do this the scientist has formulated a rich set of hypotheses and mechanisms that could account for a wide variety of possible findings. When the control conditions produce unusual results, the scientist is already considering a host of potential mechanisms, and thus a surprising finding allows the scientist to focus on the aspects of his or her current conceptual structure that need to be changed or rejected...The manner in which experiments are constructed minimizes the role of serendipity to the extent that when surprising results do occur, the scientist already has a constrained set of active hypotheses and mechanisms that can be used to interpret the findings (Dunbar, 1995, p. 390).

e) Falsification bias: The most experienced scientists were the ones least likely to display confirmation bias. Indeed, Dunbar claims they displayed a falsification bias, discarding results that appeared to confirm a hypothesis. Dunbar speculated that this falsification bias was a protection against airing hypotheses that might later be proved wrong, a frequent experience for the senior scientists (see the case of cold fusion in 3.1). He also pointed-out that each laboratory tended to pursue some low-risk and some high-risk projects. It would be interesting to know whether falsification bias was more likely to occur with hypotheses from high-risk projects.

In geneal, the more senior or expert a scientist, the more willing she was to modify or discard a hypotheses--indeed, sometimes too willing. Part of this willingness may come from a delibrate effort to make certain that the group or team considers alternatives. In scientific practice, much of the coordination between hypothesis and evidence goes on in groups, and the most senior members are likely to have the widest experience with divergent views. Dunbar would be well-advised to search for consistent patterns of this sort.

Dunbar also focused on the use of analogies, noting that scientists tended to prefer analogies within the same organism to the kinds of remote analogies to other systems often mentioned in cases of scientific paradigm change. We will come back to this point in the next section, but for now, let us compare Dunbar's studies of shared cognition with Rudwick's. (see section 2.4, below)

2.3.5 20th Century Biologists and 19th Century Geologists Compared

But first, let us consider how Dunbar's conclusions compare to the group cognition case we cited at the end of last chapter, concerning the great Devonian controversy. Both are studies of scientific research teams, but the Devonian case emphasized the interaction between teams, and Dunbar's study focused on what happened within teams. Furthermore, the Devonian 'teams' were really shifting allegiances among individual actors, whereas Dunbar's teams were organized laboratories with senior researchers, post-docs and graduate students.

How do Dunbar's conclusions square with Rudwick's? Murchison showed little evidence of falsification bias,; he initially used the possibility of error to dismiss anomalous results, produced by others; if that didn't work, he modified corollary assumptions of his overall system to accommodate the data. The closest he came to falsification bias was when he nearly abandoned the Devonian system after encountering a series of negative surprises in the Rhineland, but Lonsdale fossil work helped rescue his system, and he was eventually able to restore his Devonian system.

In other words, Murchison appeared to operate more like the junior than the senior scientists in Dunbar's study. But remember that Dunbar's work was done inside the teams, where it made sense to be critical. Probably Dunbar's scientists supported their hypotheses strongly once they were put into the public domain. Mitroff studied lunar scientists at the time of the Apollo mission and noted that those deemed most outstanding by their colleagues were " "the most creative" for their continual creation of "bold, provocative, stimulating, suggestive, speculative hypotheses" and "the most resistant to change" for "their pronounced ability to hang onto their ideas and defend them with all their might to theirs and everyone else's death" "(Mitroff, 1981, p. 171). Dunbar's work inside the research teams needs to be complemented by an analysis of how the teams ideas are disseminated and defended.

Like Dunbar's experts, Murchison and other participants in the Devonian controversy relied heavily on local analogies--literally local, in the sense that they tried to draw analogies between their own region and other locations. Indeed, one of the central debates in the controversies was whether such analogies were appropriate. Murchison in creating the Silurian, Devonian and Permian systems established the importance of using local analogies to form global representations of strata. The topic of analogy deserves at least brief consideration in a section of its own.

101_5.gif (5617 bytes)

This page was last edited: Wednesday, July 14, 1999