Essay: Jen Henderson

Final Revised Essay

Do You See What I See? The Problem of Observation in Weather Forecasting

On June 13, 2012 a line of storms, called a derecho, swept across the Mid-Atlantic causing widespread damage and power outages. Not as devastating as the derecho from the year before, it nonetheless cost states millions of dollars and tested the skill of weather forecasters downstream of its initialization. Since these types of multi-storm events have proved difficult to predict and sometimes complicated to track, they require a significant amount of coordination between the National Weather Service (NWS), who issues watches and warnings and various other weather-related groups.

I had been conducting ethnographic work at a local branch office of the NWS the day of the derecho and witnessed first-hand how the forecasting of this event unfolded. Most pertinent to this essay, I spent several hours that evening volunteering with verification efforts, or looking for evidence that supported warnings. To this end, I spent my time telephoning people in the community who, based on radar images superimposed on maps from Google Earth, were found to be in the path of a storm. I asked residents if they observed specific elements of weather that would help forecasters justify their thunderstorm warnings based on those criteria that underpin each warning type. In light of this event, I began to wonder about the role of observation—expert and otherwise—in the science of weather forecasting. This essay on the role of observation in weather forecasting is the result of my musings.

The first half of this essay situates the concept of observation and its role in science in discussions by philosophers of science. Much of the literature we read for class doesn’t address observation directly; instead it examines observation through the related concepts of empiricism and induction. Thus I’ll summarize the discussion of these two terms to get at the role of observation, and its limitations, in science. For the second part of my essay, I’ll draw from these discussions to explore how operational meteorologists at the NWS use on observation in severe weather warnings and forecasting. My goal is to better understand the role of observation in meteorology, as well as how particular forms of observation may variously support and undermine forecasting as a science.

Empiricism and Expertise

At its heart, empiricism entails making knowledge claims about the world based on sensory experience. A scientist visually observes elements of the natural world about which she then develops hypotheses. Later she designs an experiment, witnesses its success or failure, repeats the procedure, and concludes by making claims about the truth value of the hypothesis being tested. At each stage of the scientific method, empiricism gets expressed through observation of objects and processes, and it relies on realist assumptions about the nature of reality. Specifically, science proceeds on the premise that the natural world is not socially constructed but instead can be known and tested, holding within its physical structures several potential truths about the nature of reality. In Laudan’s parlance (though counter to his own argument), observation allows scientists to demonstrate through experimentation that a hypothesis does indeed “refer” to a tangible object in the world (Laudan, 1981).

Essential to the concept of empiricism in science is objectivity. Since humans are subjective in the ways they experience the world and how they report what they experience, any observation from an individual point of view is likely to carry with it the biases of that person. Observation in this sense seems more or less a direct form of assessment between the scientist and the object of study. Yet, while a scientist may observe directly through her own senses, particularly in developing questions about the world, she is likely skeptical of her individualized sensory experiences, instead relying on standardized instruments developed to mediate subjectivity in drawing conclusions. So scientists use tools to calibrate measurements and remove subjectivity from their experience of the object of study, resulting (theoretically) in more objective observations. In the case of the microscope, for example, the object on the slide is brought more fully into view through the power of the lens, allowing the scientist to better detect the processes, shapes, and interactions at play in front of her. The lens makes up for any individual biases or deficiencies in the eyes.

As we can see in this example, one must be trained to use the instrument and to understand what one sees. Not just anyone can look into a microscope at a cell, for example, and confirm that it is a cell. An amateur who uses an instrument does not necessarily result in a false observation in the broad sense. He may say he sees a squiggle that matches the squiggle the expert reports. It depends largely on the instrument and its complexity. The amateur may be able to accurately describe an amoeba on a slide under a microscope but may not know how to set the slide or work the knobs that bring the object into clear view. Nor may he understand how to operate more complicated machines like centrifuges. Additionally, an amateur likely doesn’t know how to translate that observation—what he sees—into descriptive and explanatory language that supports the expert’s scientific endeavor. This is certainly true in meteorology where the equipment is so complex that it takes forecasters years of hands-on training to become skilled at understanding what they see in the data and in the sky. Expertise, then, is likewise intimately bound with scientific observation.

But expertise itself is questionable. When is someone expert enough to make observations that confirm or refute scientific theories? Clearly novice scientists, such as graduate students or assistants in labs, are capable of making empirical observations and reliably using instrumentation; however, their contributions are still limited by the socalizing processes of science, which dictates that scientists have a must have a certain level of interaction with the scientific community before publications announcing scientific knowledge can be accepted. And the rules of expertise seem different for each kind of discipline with some, such as biology, requiring scientists to have credentials (e.g. Ph.D.s or post-docs) before one is accepted in the community as a full-fledged expert. In meteorology, however, many practicing forecasters who are considered expert hold only bachelor’s degrees or have compensated for educational requirements with time spent as a forecaster in the military. In this case, experience and reputation count more toward expertise. While our readings in the philosophy of science don’t explicitly address the issue of who counts as expert, some of our readings have hinted at potential issues.

While not about expertise per se, scholars like Longino have opened a door to accepting more socially distributed and subjective elements in the scientific process, including verification. She argues that subjective viewpoints brought together can form a consensus, or intersubjective criticism, which makes objectivity a “characteristic of a community’s practice of science rather than an individuals…” (p. 179) and thus “social knowledge" (p.180). It may also be reasonable, then, to extend the idea of a larger social group to other candidates for deciding which empirical evidence should count as proof, or who ought to judge science to be sound. Policy makers might count in this context. And certain members of the public who hold what Wynne (1986) calls lay expertise might be good candidates. In fact, consensus conferences held across the globe demonstrate that citizens can participate (and perhaps ought to participate) in the construction of scientific knowledge. For example, citizen-conducted science is a key aspect of environmental justice disputes about the validity of certain scientific claims. In the case of weather observations, the question would be whether or not citizens who offer empirical evidence at the behest of forecasters contribute in ways that verify the science.

But are there more fundamental issues with empiricism? Popper certainly believed so. When used as a deciding criterion between science and non-science, empirical observation fails. For example, other disciplines, such as astrology, rely on expert observation and empirical knowledge to make claims about the truth of its theories. Yet, we know that by these standards, astrology is not a science—it lacks predictive power, and for Popper, it isn’t falsifiable. That is, astrological predictions can always be explained as successful by its proponents. So one must tread lightly when depending on empirical observations as a sole means of arriving at scientific principles or theories. As Popper argued

The principle of empiricism…can be fully preserved, since the fate of a theory, its acceptance or rejection, is decided by observation and experiment—by the result of tests. So long as a theory stands up to the severest tests we can design, it is accepted; if it does not, it is rejected. But it is never inferred, in any sense, from the empirical evidence.…Only the falsity of the theory can be inferred from empirical evidence, and inference is a purely deductive one. (Popper, p. 33)

Thus, when used by experts in science to falsify a theory, empirical evidence is useful. Of course, again, who counts as “expert enough” to make claims about empirical evidence raises interesting questions about the type of evidence one gathers. What counts as “the severest tests we can design” is also unclear. What counts becomes especially problematic for fields such as weather forecasting when meteorologists must expand their observational network to include untrained citizens in their verification processes. My work in the forecast office the day of the derecho, for example, is not unusual in a NWS environment. Frequently, forecasters solicit volunteers from the public to come to the office to help with verifications, though they do require a small amount of training through their spotter programs, such as SkyWarn or their HAM radio operator network.

While citizen participation in verification of a phenomenon isn’t the same as that of testing a scientific theory, severe weather events do offer inductive cases (discussed more at length below) that help forecasters decide both whether their warning was successful in this particular instance, and, to some degree, whether the science on which the warnings are predicated is likewise sound. Imagine that a windstorm strikes a neighborhood and gets reported as a tornado. It must then be verified by the NWS based on principles of how a tornado works—wind direction, debris fields, etc. If no damage indicators meet these criteria for a tornado and yet several people saw a swirling mass of wind touch down, then the science of tornadogensis might be cast into doubt, or at least questioned. So just what is the value of individual observations?


My crude understanding of induction is that it involves looking at specific instances of a phenomena and noting any patterns or commonalities across them that might lead to inferences, or broader generalizations, about the natural world. These inferences are not certainties but probabilities based on moving from “premisses about objects we have examined to conclusions about objects we haven’t” (Okasha, p. 19). Thus continual observations of isolated instances that confirm the conclusions can only be spoken about as evidence and not proof, and there is always the chance that an instance may be found that falsifies the conclusions—as with the example of the tornado witnessed above.

While induction is less preferable to deduction, which is more certain because the premises entail the conclusions, scholars, such as Hume, argue that induction is the foundation of science, in spite of its reliance on “blind faith” (qtd. in Okasha, p. 27). We may need to be satisfied with evidence that does not “guarantee the truth of the conclusion” but makes it “quite probable” (ibid). Or we need only to use deduction (Popper) or argue for induction as foundational to who we are as humans (Strawson). In any case, induction seems to be one of the more promising engines for the scientific method, even if it doesn’t give us absolute certainty. Of course, some have suggested ways to improve our practices of science, pointing out how we can be more careful about how we construct the predicates of our hypotheses (Goodman) or cautious about claims we make about what we want to believe in terms of the way science works (Laudan).

What troubles me about induction is what specific practices in the science of meteorology can be called induction. Let me illustrate. After a severe weather warning meteorologists rely both on instrumentation and visual observations to verify the accuracy of their warnings. That is, a weather warning is only deemed successful by the NWS when evidence can be observed that meets the warning’s criteria for success. The absence of such verification constitutes a miss, or false alarm. This is true for all weather warnings: observations that offer “ground truth” evidence for an event’s criteria help forecasters measure their success, which gets reported as part of a statistic through the Government Performance and Results Act. I suspect that these hits and misses offer some kind of collective evidence that the theories about atmospheric dynamics and mesoscale meteorology are correct, but it’s not clear how.1 Or perhaps it’s that these observations have the potential to act as inductive support for meteorology but are used in other ways instead.

To understand just how observations are used, I’ll next examine at two ways they are conducted at the NWS to support their forecasting success: first through data collection instrumentation, specifically weather balloon and rawinsondes, and then through severe weather warning verification, which includes solicitation of evidence and local storm reports. The former represents instruments controlled and designed by experts, which count as legitimate inductive methods; the latter represents lay observation, which is problematic in terms of expertise, subjectivity, and credibility.

Expert Instruments

Operational meteorologists rely heavily on computer models that present alternative futures of atmospheric conditions. Algorithms underpinning these models account for several variables in the atmosphere but are fundamentally based on real-time observational data. The network of instruments that collect such information—temperature, dewpoint, wind speed and direction, and precipitation amounts—include technology that has been standardized by government bodies, such as the Automated Weather Observation System (or AWOS), which are sited at local airports and controlled by the Federal Aviation Administration. These stations regularly report data to the NWS computer systems, which then feed it into the different atmospheric models, as well as into forecasting tools used by operational meteorologists to predict daily weather.

Perhaps the most popular and important observational technology is the twice-daily weather balloon launch. Meteorologists send a helium-inflated latex balloon and its accompanying sonde unit into the atmosphere to the edge of the tropopause, where it collects and transmits a vertical profile of the atmosphere at that particular location. These launches occur at 0z and 12z Greenwich Mean from about 800 locations across the globe, some 100 of which originate in forecast offices in the United States. This type of observation, standardized in design and application, closely matches the kind intended to represent scientific inquiry.

Weather balloon launches depend on the accurate operation of the technology itself, forecaster observation of actual conditions during the launch, and careful monitoring or “quality control” of data received from the rawinsonde transponder. Because of its importance in the predictive process, it is governed by a dedicated position—the NWS intern—at each office. Interestingly, it is the most novice members of the staff who oversee what becomes a mechanized procedure, one carefully calibrated through training and monitoring of results submitted each month. Administrators at the National Climactic and Data Center in Asheville, NC, scrutinize data from each office, looking for anomalous information and assigning a point-based evaluation of each launch, deducting points if the balloon fails to reach a certain height or if it is delayed beyond the appropriated time. A carefully monitored process, forecasters have explained it to me this way: “Garbage in, garbage out.” If bad data gets imported into the climate and weather models, then performance suffers and meteorological predictions fail. This process, then, entails several layers and types of observation: the actual observation of forecasters launching the balloon, the rawinsonde unit’s detection of atmospheric conditions, the forecaster’s monitoring of incoming data, and the administrative oversight of the entire process. One might say, then, as with many scientific labs, most observations are heavily observed.

Few doubt the credibility and reliability of observations taken through NWS-approved instrumentation. One might argue that this is where operational meteorology is at its most scientific, in its collection of data. Observations can be correlated with forecasts and the difference between them studied as a way of improving meteorological science and the technology of prediction. On the front end of the predictive process, then, all is well in the forecaster’s world in terms of his scientific enterprise. On the backside, after prediction, however, the process of verification is where the science potentially goes awry.

Citizen Instruments

Meteorology hangs its scientific hat on the philosophical argument that good science makes good predictions. Predicting future events becomes a basis for demarcation between science and nonscience (Goodman). In operational meteorology, the criterion shifts slightly to include both the successful prediction of future events and successful verification of warnings. That is, verified warnings amount to a good prediction. For instance, a forecaster predicts that there will be severe thunderstorms in a large area of Virginia. is prediction is not specific enough to say exactly where or at what time these storms will occur. However, once the storms initiate, the NWS forecasters then issue warnings, which appear as colored polygons overlaid on a topographical map and the radar signature of the storm. These warnings represent the prediction: they include the severity of the elements in the storm (hail, wind, etc.) and they are issued for a particular path and for a length of time. Demonstrating a good prediction, in this case, entails finding evidence that supports the warning criteria, which are determined by each office based on local climatology and geography. In Blacksburg, one must find evidence of two trees being toppled by winds to verify a severe warning for winds; for hail, one must find evidence of quarter-size hail—and one hailstone verifies the warning for the whole polygon, an area that often measures 20 miles square or more!2

After a severe weather event has been forecast, meteorologists rely both on instrumentation and visual confirmation, or “ground truth,” to verify the accuracy of their warnings. Some of the same instruments that collect observational data also play a key role in demonstrating that a severe warning was warranted. For example, a severe weather warning issued for high wind speed can be verified by instruments at a local airport that measure the wind speed above the criterion for a successful warning, say 55 miles per hour. In another example, the case of a tornado warning, a vortex of wind must be visually confirmed by a witness as having touched the ground in order for the warning to be recorded as a “hit.” The absence of such verification constitutes a miss, or false alarm. Whether it is through outdoor collection instruments or the solicitation of “ground truth” from members of the community, operational meteorology as conducted by the NWS cannot succeed without some form of verification.

The nature of this latter form of verification is, perhaps, most problematic to the discussion of induction. When the forecaster calls a member of the public looking for details of the latest warning, he identifies himself as a member of the NWS and then offers information about the time and direction of the storm, followed by a prompt like this: “Did you experience wind strong enough to blow down any trees?” If the person says yes, or starts to describe the nature of the wind, the forecaster follows up by asking if the person noticed at least two trees down. There are a variety of possible directions this conversation could take, from a person simply talking about how scared they are or how they’ve lost power or how they thought the wind was strong enough to be a tornado. But if the person says that he did have two trees go down on his property or within sight, the forecaster takes down the address on a piece of paper, records the description, and files it as a local storm report or LSR. These become official pieces of evidence that the forecaster can uses to demonstrate that the warning was warranted.

In conversations I’ve had with meteorologists at the Blacksburg office, they acknowledge that some LSRs are problematic. People reporting the downed trees don’t know the condition of the trees, for example. Nor do they know if the trees were truly blown down by the wind and or if some other mitigating factor caused the trees to fall. In essence, forecasters have to accept the account and try to find additional witnesses to the same criteria. Still, one local storm report verifies a warning, giving that office an improved statistical score for what they call Probabilities of Detection (POD) and Critical Success Index. (CSI). And I’ve watched several members of an office spend several hours—sometimes eight or nine on a busy day—calling members of the public looking for verification. They “drill down” as far as they can along the storm’s path within the polygon, and only after they’ve exhausted all phone numbers, including fire departments, 911 operators, and other emergency managers, do they accept designation for that warning of false alarm.

Because these subjective phone calls get transformed in the storm report to a statistical outcome, it’s difficult to see the issues at stake. It’s clear to me, however, and to many forecasters, that they’re satisfying a bureaucratic requirement rather than finding scientific evidence. These verifications count primarily as performance measures rather than as instances of induction. Unlike instrument-based observations, which more closely uphold the norms of scientific practice, citizen observations are talked about with more skepticism and, at times, incredulity. “It’s a dirty little secret,” one forecaster said to me about the tenuous nature of verification.


While my discussion is by no means exhaustive nor is it representative of all weather forecasters in all sectors of the workforce—the government is just one of several employees of forecasters—it does suggest that there are issues with how criteria for evidence in induction get defined and applied. Would forecasters still attempt to verify their warnings the same way if they no longer had to justify each one? Surely they would still need some way to collect information about a particular storm’s behavior relative to their forecast of it. But is there a way to enlist the broader citizenship to become better gatherers of evidence? Should they? Or should forecasters dedicate more resources to doing this sort of information gathering themselves? Or is it that all attempts at induction necessitate some degree of subjectivity and so we should lessen the rigor of what counts as evidence? This strikes me as a slippery slope toward potentially troubled outcomes in science but it equally strikes me as unrealistic to hold all evidence gathered for inductive purposes to idealized specifications. Forecasters can’t do their job without the public. Nor, I believe, should they. But they should discuss more explicitly the complications entailed in the process and the consequences for the credibility of their scientific practice that may result from this interdependence.

Works Cited

Curd, M. & Cover, J.A. (1998). “Introduction to Chapter 4: Induction, Prediction and Evidence.” In Curd, M. & Cover J.A. (Eds.), Philosophy of Science: The Central Issues. (409-412). New York: W. W. Norton & Company.
Goodman, N. (1983). “The New Riddle of Induction.” In Fact, Fiction and Forecasts. Cambridge, Harvard University Press.
Laudan, L. (1981). “A Confutation of Convergent Realism.” Philosophy of Science (48)1: 19-49.
Longino, H. E. (1998). “Values and Objectivity.” In Curd, M. & Cover J.A. (Eds.), Philosophy of Science: The Central Issues. (170-192). New York: W. W. Norton & Company.
Okasha, S. (2002). Philosophy of Science: A Very Short Introduction. Oxford: Oxford University Press.
Popper, K. “Science: Conjectures and Refutations.” In Curd, M. & Cover J.A. (Eds.), Philosophy of Science: The Central Issues. (3-11). New York: W. W. Norton & Company.
Quine, W.V. (1998). “Two Dogmas of Empiricism.” In Curd, M. & Cover J.A. (Eds.), Philosophy of Science: The Central Issues. (280-302). New York: W. W. Norton & Company.
Shapin, S. and Schaffer, S. (1989). Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life. Princeton: Princeton University Press.
Wynne, B. (1996) “May the Sheep Safely Graze? A Reflexive View of the Expert-Lay Knowledge Divide.” In Lash, S., B. Szerszynski, and B. Wynne (Eds.), Risk, Environment, and Modernity. (44-83). London: Sage.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License