The Nature of Scientific Evidence Statistical, Philosophical, and Empirical Considerations
edited by Mark L. Taper and Subhash R. Lele
University of Chicago Press, 2004
Cloth: 978-0-226-78955-2 | Paper: 978-0-226-78957-6 | Electronic: 978-0-226-78958-3


An exploration of the statistical foundations of scientific inference, The Nature of Scientific Evidence asks what constitutes scientific evidence and whether scientific evidence can be quantified statistically. Mark Taper, Subhash Lele, and an esteemed group of contributors explore the relationships among hypotheses, models, data, and inference on which scientific progress rests in an attempt to develop a new quantitative framework for evidence. Informed by interdisciplinary discussions among scientists, philosophers, and statisticians, they propose a new "evidential" approach, which may be more in keeping with the scientific method. The Nature of Scientific Evidence persuasively argues that all scientists should care more about the fine points of statistical philosophy because therein lies the connection between theory and data.

Though the book uses ecology as an exemplary science, the interdisciplinary evaluation of the use of statistics in empirical research will be of interest to any reader engaged in the quantification and evaluation of data.


Mark L. Taper is an associate professor in the Department of Ecology at Montana State University. Subhash R. Lele is a professor in the Department of Mathematics and Statistical Sciences at the University of Alberta.


"The book is a rare find: a source that could be used in graduate seminars in statistics, philosophy, or biology....It is brimming with ideas....It deserves a read by everyone."
— Marc Mangel, Science

"This is a challenging, stimulating, and important book Although some of the chapters are not for the statistically naive, all are thorough and provocative....The Nature of Scientific Evidence should be read by all ecologists who interpret data as evidence for or against specific hypotheses."— Gerry Quinn, Trends in Ecology and Evolution

"The Nature of Scientific Evidence may well be viewed as a landmark publication in years to come, one that was the precursor to a new set of statistical methodologies based on evidence and likelihood. . . . We unreservedly recommend it to every ecologist wanting to understand more about the relationship between logic, evidence, analysis and inference – which, after all, constitutes the essence of the scientific method."
— Graeme Hastwell and S. Raghu, Austral Ecology

"The book is important not because of its specific content, but because of what it represents: a cross-disciplinary dialogue that addresses key issues in data analysis."
— Nicholas J. Gotelli, Ecoscience

"It is precisely because this nicely and carefully edited book will provide more questions than answers that it deserves to be read and discussed by the wide audience of statisticians, philosophers of science, and scientists to whom it addresses the important problem of the evaluation of scientific evidence."
— Pablo Inchausti, Quarterly Review of Biology

"This volume is a wonderful guide helping ecologists to understand many of the statistical nuances as well as an introduction to some deep-rooted methodological and philosophical issues in data analysis. . . . An important and necessary discussion that ecologists need to have."
— Marc W. Cadotte, Biodiversity Conservation




Part 1: Scientific process

- Nicholas Lewin-Koh, Mark L. Taper, Subhash R. Lele
DOI: 10.7208/chicago/9780226789583.003.0001
[scientific method, statistics, statistical inference, statistical models, statistical hypotheses, scientific evidence, sex ratios, Fisherian P-value tests, Neyman-Pearson tests, Bayesian tests]
In the seventeenth century, Francis Bacon proposed what is still regarded as the cornerstone of science, the scientific method. Bacon saw science as an inductive process, meaning that explanation moves from the particular to the more general. In 1959, however, Karl Popper argued that science progresses through deduction, meaning that we proceed from the general to the specific. This chapter introduces key concepts of statistical inference. It first differentiates between a scientific hypothesis and a statistical hypothesis and explores the relationship of both to statistical models. It also provides an overview of the sample space, random variables, and the parameter space, as well as the mechanics of testing statistical hypotheses. It then describes the language and most basic procedures of Fisherian P-value tests, Neyman-Pearson tests, Bayesian tests, and the ratio of likelihoods as measures of strength of scientific evidence. It demonstrates each method with an examination of a simple but important scientific question, Fisher's thesis of equal sex ratios. This problem of the sex proportions provides a framework to demonstrate some key concepts in statistics. (pages 3 - 16)
This chapter is available at:
    University Press Scholarship Online

- Brian A. Maurer
DOI: 10.7208/chicago/9780226789583.003.0002
[science, scientific knowledge, community ecology, inductive science, deductive science, statistics, scientific evidence, scientific inquiry, experimental designs, scientific process]
Practitioners of science often go about their craft in a manner that is unique to each discipline. Statistics must serve different purposes defined by the nature of the subject matter and maturity of a given discipline. Ecology operates under a mixture of techniques, philosophies, and goals. This chapter examines two complementary models of scientific inquiry within ecology, each of which serves unique functions in the discovery of scientific knowledge. The first is inductive science, which begins with the accumulation of observations with the intent of discovering patterns. For this kind of science, parameter estimation is more useful than formal hypothesis testing. The second is deductive science, which begins with proposed explanations deduced from formal theories. Hypothetico-deductive experimental designs are used to maximize the chance of detecting theoretical flaws by falsification of predictions. The statistical evaluation of scientific evidence plays different roles in each of these two models of the scientific process. As an example of the roles played by inductive and deductive science in the development of a field of inquiry, this chapter considers the development of modern community ecology. (pages 17 - 50)
This chapter is available at:
    University Press Scholarship Online

- Samuel M. Scheiner
DOI: 10.7208/chicago/9780226789583.003.0003
[science, experiments, observations, scientific evidence, data, scientific theories, diversity, productivity, causation, ecology]
How do we come to conclusions in science? What sorts of evidence do we use and how do we use them? This chapter explores the question of the spectrum of types of data and evidence used in science, focusing on experiments vs. observations. It argues that all types of evidence play a role; that scientific theories are based on the consilience of the evidence, the bringing together of different, even disparate, kinds of evidence. It considers a particular ecological issue: the relationship between diversity and productivity. After laying out the scientific issues, it discusses the types of sources of empirical observations and how to weigh data that come from manipulated experiments vs. observational experiments, an especially important issue in ecology. It also considers how observational studies deal with the issue of direction of causation. Finally, the chapter looks at the relationship between scientific evidence and theory in the context of a priori vs. post hoc explanations. (pages 51 - 72)
This chapter is available at:
    University Press Scholarship Online

Part 2: Logics of evidence

- Deborah G. Mayo
DOI: 10.7208/chicago/9780226789583.003.0004
[error statistics, Neyman-Pearson tests, statistical models, statistical inference, scientific evidence, logics of evidential relationship, measures of fit, error probabilities, philosophy, behavioral-decision model]
Error-statistical methods in science have been the subject of enormous criticism, giving rise to the popular statistical “reform” movement and bolstering subjective Bayesian philosophy of science. Is it possible to have a general account of scientific evidence and inference that shows how we learn from experiment despite uncertainty and error? One way that philosophers have attempted to affirmatively answer this question is to erect accounts of scientific inference or testing where appealing to probabilistic or statistical ideas would accommodate the uncertainties and error. Leading attempts take the form of rules or logics relating evidence (or evidence statements) and hypotheses by measures of confirmation, support, or probability. We can call such accounts logics of evidential relationship (or E-R logics). This chapter reflects on these logics of evidence and compares them with error statistics. It then considers measures of fit vs. fit combined with error probabilities, what we really need in a philosophy of evidence, criticisms of Neyman-Pearson statistics and their sources, the behavioral-decision model of Neyman-Pearson tests, and the roles of statistical models and methods in statistical inference. (pages 79 - 118)
This chapter is available at:
    University Press Scholarship Online

- Richard Royall
DOI: 10.7208/chicago/9780226789583.003.0005
[statistical methods, statistical evidence, observations, statistics, statistical data, science, law of likelihood]
Statistical methods aim to answer a variety of questions about observations. A simple example occurs when a fairly reliable test for a condition or substance, C, has given a positive result. Three important types of questions are: Should this observation lead me to believe that C is present? Does this observation justify my acting as if C were present? Is this observation evidence that C is present? This chapter distinguishes among these three questions in terms of the variables and principles that determine their answers. It then uses this framework to understand the scope and limitations of current methods for interpreting statistical data as evidence. By “statistical evidence,” we mean observations that are interpreted under a probability model. Questions of the third type, concerning the evidential interpretation of statistical data, are central to many applications of statistics in science. The chapter shows that for answering them, current statistical methods are seriously flawed. It looks for the source of the problems and proposes a solution based on the law of likelihood. (pages 119 - 152)
This chapter is available at:
    University Press Scholarship Online

- Malcolm Forster, Elliott Sober
DOI: 10.7208/chicago/9780226789583.003.0006
[likelihood principle, intuitive judgments, hypotheses, observations, Richard Royall, probabilities, likelihood ratio, posteriors, evidence, Akaike information criterion]
The likelihood principle has been defended on Bayesian grounds, with proponents insisting that it coincides with and systematizes intuitive judgments about example problems, and that it generalizes what is true when hypotheses have deductive consequences about observations. Richard Royall offers three kinds of justification. He points out, first, that the likelihood principle makes intuitive sense when probabilities are all 1s and 0s. His second argument is that the likelihood ratio is precisely the factor that transforms a ratio of prior probabilities into a ratio of posteriors. His third line of defense of the likelihood principle is to show that it coincides with intuitive judgments about evidence when the principle is applied to specific cases. This chapter divides the principle into two parts—one qualitative, the other quantitative—and evaluates each in the light of the Akaike information criterion (AIC). Both turn out to be correct in a special case (when the competing hypotheses have the same number of adjustable parameters), but not otherwise. (pages 153 - 190)
This chapter is available at:
    University Press Scholarship Online

- Subhash R. Lele
DOI: 10.7208/chicago/9780226789583.003.0007
[evidence functions, evidence, hypotheses, likelihood ratio, Kullback-Leibler discrepancies, Hellinger distance, Jeffreys distance, optimality, outliers, nuisance parameters]
This chapter formulates a class of functions, called evidence functions, which may be used to characterize the strength of evidence for one hypothesis over a competing hypothesis. It shows that the strength of evidence is intrinsically a comparative concept, comparing the discrepancies between the data and each of the two hypotheses under consideration. The likelihood ratio, which is commonly suggested as a good measure for the strength of evidence, belongs to this class and corresponds to comparing the Kullback-Leibler discrepancies. The likelihood ratio as a measure of strength of evidence has some important practical limitations: sensitivity to outliers, necessity to specify the complete statistical model, and difficulties in handling nuisance parameters. By using evidence functions based on discrepancy measures such as Hellinger distance or Jeffreys distance, these limitations can be overcome. Provided the model is correctly specified, for a single-parameter case, the likelihood ratio is an optimal measure of the strength of evidence within the class of evidence functions. This result also establishes the connection between the optimality of the estimating functions and the optimality of the evidence functions. (pages 191 - 216)
This chapter is available at:
    University Press Scholarship Online

Part 3: Realities of nature

- Jean A. Miller, Thomas M. Frost
DOI: 10.7208/chicago/9780226789583.003.0008
[Deborah Mayo, error statistics, replication, whole-ecosystem experiments, BACI design, pseudoreplication, Stuart Hurlbert, stochastic events, statistical reasoning, errors]
Deborah Mayo (1996) has reinterpreted classic frequentist statistics into a much more general framework that she calls error statistics to indicate the continuing centrality and importance of error probabilities and error-probabilistic reasoning in testing hypotheses. Her generalization of statistical reasoning above and beyond any one statistical test provides a consistent and coherent approach to testing and assessing both quantitative and qualitative evidence and hence can be directly applied to whole-ecosystem experiments. This chapter argues that understanding the types of errors that replication controls allows for better design and interpretation of unreplicated and semi-replicated whole-ecosystem experiments. It begins by clarifying the meaning of three common concepts used in debates about what can and cannot be learned from whole-ecosystem manipulations: replication, BACI design, and pseudoreplication. It then rephrases Stuart Hurlbert's first error of concern and discusses replication as a check on stochastic events beyond natural variation. (pages 221 - 274)
This chapter is available at:
    University Press Scholarship Online

- Mark L. Taper, Subhash R. Lele
DOI: 10.7208/chicago/9780226789583.003.0009
[nature, ecology, dynamical models, evidence, causation, statistical models, exploration, explanation, experiments]
Natural scientists like to understand how nature works. Usually this quest begins with exploration of empirical patterns in nature. This search may involve a visual exploration of dependencies of variables using computer programs and high-speed graphics that allow rotation of data in three dimensions and sometimes animation to simulate the fourth dimension. The associations between variables that are seen can be tested for statistical significance using various techniques such as the Monte Carlo randomization procedures. These associations, although suggestive, do not necessarily reveal causality. Many statistical techniques and models concentrate on association rather than causation. It is important that we move from exploration and description to explanation. Dynamical models are useful for incorporating explicitly causal pathways in the statistical models. Consequently, dynamical models help in the design of experiments to differentiate among causal pathways. This chapter explores the use of dynamical models, deterministic or stochastic, as paths to evidence in ecology. The main idea is that dynamical models are more likely to lead to the understanding of causation than simple statistical association models. (pages 275 - 297)
This chapter is available at:
    University Press Scholarship Online

- James H. Brown, Edward J. Bedrick, S. K. Morgan Ernest, Jean-Luc E. Cartron, Jeffrey F. Kelly
DOI: 10.7208/chicago/9780226789583.003.0010
[PSD condition, constraints, strongly negative correlations, patterns, ecological networks, complex systems, variables]
One problem in inferring process from pattern is that offsetting and indirect effects, nonlinearities, and other difficulties in complex systems may prevent a clear pattern from being generated even though the hypothesized process is operating. One such complication is the constraint on the magnitudes of correlations and covariances when some of the relationships are negative. The positive semidefinite (PSD) condition requires that each of the eigenvalues and each of the principal minors of a correlation matrix be nonnegative, thereby placing limits on the possible values of the correlation coefficient. Using simple ecological examples, this chapter illustrates how processes that can generate strongly negative correlations in a simple two-variable system can be constrained from generating such clear patterns when there are multiple variables. The PSD condition is more than a statistical curiosity; it reflects real constraints on the magnitudes and patterns of interactions in ecological networks and other complex systems. (pages 298 - 324)
This chapter is available at:
    University Press Scholarship Online

Part 4: Science, opinion, and evidence

- Brian Dennis
DOI: 10.7208/chicago/9780226789583.003.0011
[Bayesian statistics, ecology, science, scientific method, frequentist statistics, postmodernism, relativism, statistical inference, statistical analysis, Bayesianism]
The questioning of science and the scientific method continues within the science of ecology. The use of Bayesian statistical analysis has recently been advocated in ecology, supposedly to aid decision makers and enhance the pace of progress. Bayesian statistics provides conclusions in the face of incomplete information. However, Bayesian statistics represents a much different approach to science than the frequentist statistics studied by most ecologists. This chapter discusses the influence of postmodernism and relativism on the scientific process and in particular its implications, through the use of subjective Bayesian approach, in statistical inference. It argues that subjective Bayesianism is “tobacco science” and that its use in ecological analysis and environmental policy making can be dangerous. It claims that science works through replicability and skepticism, with methods considered ineffective until they have proven their worth. It proposes the use of a frequentist approach to statistical analysis because it corresponds to the skeptical worldview of scientists. (pages 327 - 378)
This chapter is available at:
    University Press Scholarship Online

- Daniel Goodman
DOI: 10.7208/chicago/9780226789583.003.0012
[Bayesian approach, subjective probability, objective probability, Bayesianism, statistical inference, decision making, compound sampling, objective priors]
Decision theory requires the assignment of probabilities for the different possible states of nature. Bayesian inference provides such probabilities, but at the cost of requiring prior probabilities for the states of nature. In this century, the justification for prior probabilities has often rested on subjective theories of probability. Subjective probability can lead to internally consistent systems relating belief and action for a single individual; but severe difficulties emerge in trying to extend this model to justify public decisions. Objective probability represents probability as a literal frequency that can be communicated as a matter of fact and that can be verified by independent observers confronting the same information. This chapter argues that the Bayesian approach is best for making decisions and that one needs to put probabilities on various hypotheses. It proposes an interpretation of statistical inference for decision making, but disapproves of the subjective aspects of Bayesianism and suggests, as an alternative, using related data to create “objective” priors. The chapter also considers a compound sampling perspective and presents a concrete example of compound sampling. (pages 379 - 409)
This chapter is available at:
    University Press Scholarship Online

- Subhash R. Lele
DOI: 10.7208/chicago/9780226789583.003.0013
[ecological studies, expert opinion, statistical inference, spatial data, soft data, Bayesian approach, priors, hierarchical model, presence-absence data, elicited data]
Ecological studies are often hampered by insufficient data on the quantity of interest. Limited data usually lead to a relatively flat likelihood surface that is not very informative. One solution is to augment the available data by incorporating other possible sources of information. A wealth of information in the form of “soft” data, such as expert opinion about whether pollutant concentration exceeds a certain threshold, may be available. This chapter proposes a mechanism to incorporate such soft information and expert opinion in the process of inference. A commonly used approach for incorporating expert opinion in statistical inference is via the Bayesian paradigm. This chapter discusses various difficulties associated with the Bayesian approach. It introduces the idea of eliciting data instead of priors and examines its practicality. It then describes a general hierarchical model setup for combining elicited data and the observed data. It illustrates the effectiveness of this method for presence-absence data using simulations. The chapter demonstrates that incorporating expert opinion via elicited data substantially improves the estimation, prediction, and design aspects of statistical inference for spatial data. (pages 410 - 436)
This chapter is available at:
    University Press Scholarship Online

Part 5: Models, realities, and evidence

- Bruce G. Lindsay
DOI: 10.7208/chicago/9780226789583.003.0014
[model adequacy, statistical inference, errors, statistical analysis, model misspecification, confidence intervals, scientific significance, prediction variability, statistical distances]
This chapter takes on the problem of model adequacy and makes an argument for reformulating the way model-based statistical inference is carried out. In the new formulation, it does not treat the model as “truth.” It is instead an approximation to truth. Rather than testing for model fit, an integral part of the proposed statistical analysis is to assess the degree to which the model provides adequate answers to the statistical questions being posed. One method for doing so is to create a single overall measure of inadequacy that evaluates the degree of departure between the model and truth. The chapter argues that there are two components of errors in any statistical analysis. One component is due to model misspecification; that is, the working model is different from the true data-generating process. The chapter compares confidence intervals on model misspecification error with external knowledge of the scientific relevance of prediction variability to address the issue of scientific significance. The chapter also analyzes several familiar measures of statistical distances in terms of their possible use as inadequacy measures. (pages 439 - 487)
This chapter is available at:
    University Press Scholarship Online

- Mark L. Taper
DOI: 10.7208/chicago/9780226789583.003.0015
[model identification, scientific evidence, information criteria approach, likelihood ratio approach, statistical models, Akaike information criterion, Schwarz's information criterion, penalty, model complexity, sample size]
Model identification is a necessary component of modern science. Model misspecification is a major, if not the dominant, source of error in the quantification of most scientific evidence. Hypothesis tests have become the de facto standard for evidence in the bulk of scientific work. This chapter discusses the information criteria approach to model identification, which can be thought of as an extension of the likelihood ratio approach to the case of multiple alternatives. It shows that the information criteria approach can be extended to large sets of statistical models. There is a tradeoff between the amount of model detail that can be accurately captured and the number of models that can be considered. This tradeoff can be incorporated in modifications of the parameter penalty term. The chapter also examines the Akaike information criterion and its variants, such as Schwarz's information criterion. It demonstrates how a data-based penalty can be developed to take into account the working model complexity, model set complexity, and sample size. (pages 488 - 524)
This chapter is available at:
    University Press Scholarship Online

Part 6: Conclusion

- Mark L. Taper, Subhash R. Lele
DOI: 10.7208/chicago/9780226789583.003.0016
[science, scientific evidence, frequentist statistics, Bayesian statistics, model adequacy, model selection, expert opinion, replication, nuisance parameters, outliers]
A method that has proved extremely successful in the history of science is to take ideas about how nature works, whether obtained deductively or inductively, and translate them into quantitative statements. These statements, then, can be compared with the realizations of the processes under study. The main two schools of statistical thought, frequentist and Bayesian statistics, do not address the question of evidence explicitly. This chapter summarizes various approaches to quantifying scientific evidence and compares them to Bayesian and frequentist statistics. It discusses ideas on model adequacy and model selection in the context of quantifying evidence and explores the role and scope of the use of expert opinion. Replication is usually highly desirable but in many ecological experiments difficult to obtain. How can one quantify evidence obtained from unreplicated data? Nuisance parameters, composite hypotheses, and outliers are realities of nature. Finally, the chapter raises a number of important unresolved issues, such as using evidence to make decisions without resorting to subjective probability. (pages 527 - 552)
This chapter is available at:
    University Press Scholarship Online