Public policy debates often involve appeals to results of work in social sciences like economics and sociology. For example, in his State of the Union address this year, President Obama cited a recent high-profile study to support his emphasis on evaluating teachers by their students’ test scores. The study purportedly shows that students with teachers who raise their standardized test scores are “more likely to attend college, earn higher salaries, live in better neighborhoods and save more for retirement.”
Beware the journalistically exciting result.
How much authority should we give to such work in our policy decisions? The question is important because media reports often seem to assume that any result presented as “scientific” has a claim to our serious attention. But this is hardly a reasonable view. There is considerable distance between, say, the confidence we should place in astronomers’ calculations of eclipses and a small marketing study suggesting that consumers prefer laundry soap in blue boxes.
A rational assessment of a scientific result must first take account of the broader context of the particular science involved. Where does the result lie on the continuum from preliminary studies, designed to suggest further directions of research, to maximally supported conclusions of the science? In physics, for example, there is the difference between early calculations positing the Higgs boson and what we hope will soon be the final experimental proof that it actually exists. Scientists working in a discipline generally have a good sense of where a given piece of works stands in their discipline. But often, as I have pointed out for the case of biomedical research, popular reports often do not make clear the limited value of a journalistically exciting result. Good headlines can make for bad reporting.
Second, and even more important, there is our overall assessment of work in a given science in comparison with other sciences. The core natural sciences (e.g., physics, chemistry, biology) are so well established that we readily accept their best-supported conclusions as definitive. (No one, for example, was concerned about the validity of the fundamental physics on which our space program was based.) Even the best-developed social sciences like economics have nothing like this status.
Consider, for example, the report President Obama referred to. By all accounts it is a significant contribution to its field. As reported in The Times, the study, by two economists from Harvard and one from Columbia, “examined a larger number of students over a longer period of time with more in-depth data than many earlier studies, allowing for a deeper look at how much the quality of individual teachers matters over the long term.” As such, “It is likely to influence the roiling national debates about the importance of quality teachers and how best to measure that quality.”
But how reliable is even the best work on the effects of teaching? How, for example, does it compare with the best work by biochemists on the effects of light on plant growth? Since humans are much more complex than plants and biochemists have far more refined techniques for studying plants, we may well expect the biochemical work to be far more reliable. For making informed decisions about public policy, though, we need to have a more precise sense of how large the difference in reliability is. Is there any work on the effectiveness of teaching that is solidly enough established to support major policy decisions?
The case for a negative answer lies in the predictive power of the core natural sciences compared with even the most highly developed social sciences. Social sciences may be surrounded by the “paraphernalia” of the natural sciences, such as technical terminology, mathematical equations, empirical data and even carefully designed experiments. But when it comes to generating reliable scientific knowledge, there is nothing more important than frequent and detailed predictions of future events. We may have a theory that explains all the known data, but that may be just the result of our having fitted the theory to that data. The strongest support for a theory comes from its ability to correctly predict data that it was not designed to explain.
While the physical sciences produce many detailed and precise predictions, the social sciences do not. The reason is that such predictions almost always require randomized controlled experiments, which are seldom possible when people are involved. For one thing, we are too complex: our behavior depends on an enormous number of tightly interconnected variables that are extraordinarily difficult to distinguish and study separately. Also, moral considerations forbid manipulating humans the way we do inanimate objects. As a result, most social science research falls far short of the natural sciences’ standard of controlled experiments.
Without a strong track record of experiments leading to successful predictions, there is seldom a basis for taking social scientific results as definitive. Jim Manzi, in his recent book, “Uncontrolled,” offers a careful and informed survey of the problems of research in the social sciences and concludes that “nonexperimental social science is not capable of making useful, reliable and nonobvious predictions for the effects of most proposed policy interventions.”
Even if social science were able to greatly increase their use of randomized controlled experiments, Manzi’s judgment is that “it will not be able to adjudicate most policy debates.” Because of the many interrelated causes at work in social systems, many questions are simply “impervious to experimentation.” But even when we can get reliable experimental results, the causal complexity restricts us to “extremely conditional, statistical statements,” which severely limit the range of cases to which the results apply.
My conclusion is not that our policy discussions should simply ignore social scientific research. We should, as Manzi himself proposes, find ways of injecting more experimental data into government decisions. But above all, we need to develop a much better sense of the severely limited reliability of social scientific results. Media reports of research should pay far more attention to these limitations, and scientists reporting the results need to emphasize what they don’t show as much as what they do.
Given the limited predictive success and the lack of consensus in social sciences, their conclusions can seldom be primary guides to setting policy. At best, they can supplement the general knowledge, practical experience, good sense and critical intelligence that we can only hope our political leaders will have.
Gary Gutting is a professor of philosophy at the University of Notre Dame, and an editor of Notre Dame Philosophical Reviews. He is the author of, most recently, “Thinking the Impossible: French Philosophy since 1960,” and writes regularly for The Stone.
Beware the journalistically exciting result.
How much authority should we give to such work in our policy decisions? The question is important because media reports often seem to assume that any result presented as “scientific” has a claim to our serious attention. But this is hardly a reasonable view. There is considerable distance between, say, the confidence we should place in astronomers’ calculations of eclipses and a small marketing study suggesting that consumers prefer laundry soap in blue boxes.
A rational assessment of a scientific result must first take account of the broader context of the particular science involved. Where does the result lie on the continuum from preliminary studies, designed to suggest further directions of research, to maximally supported conclusions of the science? In physics, for example, there is the difference between early calculations positing the Higgs boson and what we hope will soon be the final experimental proof that it actually exists. Scientists working in a discipline generally have a good sense of where a given piece of works stands in their discipline. But often, as I have pointed out for the case of biomedical research, popular reports often do not make clear the limited value of a journalistically exciting result. Good headlines can make for bad reporting.
Second, and even more important, there is our overall assessment of work in a given science in comparison with other sciences. The core natural sciences (e.g., physics, chemistry, biology) are so well established that we readily accept their best-supported conclusions as definitive. (No one, for example, was concerned about the validity of the fundamental physics on which our space program was based.) Even the best-developed social sciences like economics have nothing like this status.
Consider, for example, the report President Obama referred to. By all accounts it is a significant contribution to its field. As reported in The Times, the study, by two economists from Harvard and one from Columbia, “examined a larger number of students over a longer period of time with more in-depth data than many earlier studies, allowing for a deeper look at how much the quality of individual teachers matters over the long term.” As such, “It is likely to influence the roiling national debates about the importance of quality teachers and how best to measure that quality.”
But how reliable is even the best work on the effects of teaching? How, for example, does it compare with the best work by biochemists on the effects of light on plant growth? Since humans are much more complex than plants and biochemists have far more refined techniques for studying plants, we may well expect the biochemical work to be far more reliable. For making informed decisions about public policy, though, we need to have a more precise sense of how large the difference in reliability is. Is there any work on the effectiveness of teaching that is solidly enough established to support major policy decisions?
The case for a negative answer lies in the predictive power of the core natural sciences compared with even the most highly developed social sciences. Social sciences may be surrounded by the “paraphernalia” of the natural sciences, such as technical terminology, mathematical equations, empirical data and even carefully designed experiments. But when it comes to generating reliable scientific knowledge, there is nothing more important than frequent and detailed predictions of future events. We may have a theory that explains all the known data, but that may be just the result of our having fitted the theory to that data. The strongest support for a theory comes from its ability to correctly predict data that it was not designed to explain.
While the physical sciences produce many detailed and precise predictions, the social sciences do not. The reason is that such predictions almost always require randomized controlled experiments, which are seldom possible when people are involved. For one thing, we are too complex: our behavior depends on an enormous number of tightly interconnected variables that are extraordinarily difficult to distinguish and study separately. Also, moral considerations forbid manipulating humans the way we do inanimate objects. As a result, most social science research falls far short of the natural sciences’ standard of controlled experiments.
Without a strong track record of experiments leading to successful predictions, there is seldom a basis for taking social scientific results as definitive. Jim Manzi, in his recent book, “Uncontrolled,” offers a careful and informed survey of the problems of research in the social sciences and concludes that “nonexperimental social science is not capable of making useful, reliable and nonobvious predictions for the effects of most proposed policy interventions.”
Even if social science were able to greatly increase their use of randomized controlled experiments, Manzi’s judgment is that “it will not be able to adjudicate most policy debates.” Because of the many interrelated causes at work in social systems, many questions are simply “impervious to experimentation.” But even when we can get reliable experimental results, the causal complexity restricts us to “extremely conditional, statistical statements,” which severely limit the range of cases to which the results apply.
My conclusion is not that our policy discussions should simply ignore social scientific research. We should, as Manzi himself proposes, find ways of injecting more experimental data into government decisions. But above all, we need to develop a much better sense of the severely limited reliability of social scientific results. Media reports of research should pay far more attention to these limitations, and scientists reporting the results need to emphasize what they don’t show as much as what they do.
Given the limited predictive success and the lack of consensus in social sciences, their conclusions can seldom be primary guides to setting policy. At best, they can supplement the general knowledge, practical experience, good sense and critical intelligence that we can only hope our political leaders will have.
Gary Gutting is a professor of philosophy at the University of Notre Dame, and an editor of Notre Dame Philosophical Reviews. He is the author of, most recently, “Thinking the Impossible: French Philosophy since 1960,” and writes regularly for The Stone.