Ican't promise this will be my last commentary on the importance of evidence in software engineering—to be more precise, I mean the role that evidence plays in the adoption of software engineering ideas. The topic just creeps up on me time and time again. It can't be helped.
My essay "Essentials of Software Process" ( IEEE Software, July/August 2008, pp. 5–7) made a case for empiricism. I wrote,
Empiricism at its heart is supporting decisions through evidence based on data, both observations and measurements. By observations I mean occurrences that we can simply record. By measurements I mean things we can count, calculate, or quantify. Measurements have values, whereas observations have descriptions, possibly including contextual information. Observations provide deeper insight in areas in which measurements serve only as proxies for other constructs. Each kind of data has a place, and empiricism entails collection and use of both kinds.
A later column, "Must Software Research Stand Divided?" (September/October 2008, pp. 4–6), was an attempt to bust the myths surrounding empirical software engineering and moderate some of the strong claims made by die-hard empiricists. To recall those myths:
• empirical research is boring, too soft, conducted in artificial settings, and dangerously interpreted;
• empirical research takes too long;
• empirical evidence isn't needed;
• empirical evidence can't possibly address allthe contextual factors or keep up with a fast-changing industry;
• empirical researchers are biased; and
• empirical research uses evidence models from other disciplines that have nothing to do with software development.
Now I feel obliged to expand upon my latest, and rather anticlimactic, mention of the subject in "Regress or Progress? Seeing Good Software Engineering Ideas Through" (March/April 2010, pp. 4–7). In that column, I stated that the maturation, acceptance, and adoption of good software engineering ideas depend on many factors. I counted the availability of evidence among those factors, further qualifying that the value of evidence itself depends on a variety of underlying subfactors. And I left it there. How convenient that the suspense gives me the opportunity to complete the circle.
Let me rewind again momentarily. In "Must Software Research Stand Divided?" I also implied that empiricists sometimes overemphasize evidence. Let's pick up that thread and weave it together with the factors affecting the usefulness of evidence.
The type of evidence available depends on an idea's maturity and the extent to which the idea lends itself to that type of evidence. In turn, the type of evidence dictates the strength of evidence.
The weakest form of evidence is the feasibility check, which can be established early in the idea's maturation life cycle, even at its conception. The purpose of the feasible check is to quickly assess major advantages, risks, and limitations, gauge the size of the problem space and the solution's novelty and cost-effectiveness, and determine whether the idea is worth exploring further in controlled but real-world situations. Feasibility can be established through simplified case studies, experimentation, simulation, or SWOT-type (strengths/weaknesses/opportunities/threats) analysis. Therefore, the sandboxing and unbiased reflection I discussed in "Regress or Progress" actually support feasibility checking.
Anecdotal evidence constitutes the middle ground. As collective experience with an idea's application in real-world situations grows, accounts regarding the ensuing successes, obstacles, and workarounds start to tell a coherent story. This type of evidence isn't available until the idea is well into its testing state. Anecdotes are helpful, but they might not be powerful enough to push a decision-maker over the adoption barrier. Anecdotes tend to be susceptible to positive reporting bias and the infamous halo effect (the tendency of a few, often positive, attributions overshadowing other, often negative, attributions).
Beyond anecdotes is the most powerful but elusive type of evidence: the systematic kind. Researchers gather systematic evidence in a deliberate and methodical attempt to isolate and reveal common, situational effects on the basis of credible, rich, consolidated data. Alas, this type of evidence might not be available until well into an idea's streamed state. As such, systematic evidence in software engineering is rare. When it exists, it tends to materialize after the fact, too late for risk-loving early adopters and of limited use to risk-averse late adopters. This latter point brings us to the remaining factors that are the attributes not of the evidence but of the idea's receptors and the idea itself.
Risk Preferences and Tolerance of Receptors
Adoption decisions are largely affected by the attitudes of the organizations and people who make the decisions. Those attitudes are often dictated by the business environment in which organizations, and decision makers acting on behalf of organizations, find themselves. A fast-changing environment with narrow windows of opportunity and low entry barriers might warrant a risk-taking, or risk-loving, attitude. A stable and rigid environment dominated by long-term prospects might warrant a risk-avoiding, or risk-averse, attitude.
So, risk-loving and risk-averse decision makers act differently when faced with the same decision. Risk-loving decision makers have convex utility functions: they tend to be content with weaker forms of evidence in return for highly promising but high-risk payoffs. Risk-averse decision makers have concave utility functions: they demand a higher probability of success but will settle for more modest prospects, and they might require stronger forms of evidence for moderately promising outcomes bearing high risk. Risk-neutral decision makers are in-between: it's sufficient for the expected payoffs to be positive to lunge forward regardless of how unbalanced the underlying payoff and risk structure might be.
Size, the number of concepts and dependencies among those concepts that together make an idea whole, affects the utility of the evidence regarding the idea's effectiveness. Small ideas, such as pair programming, in small bundles require the least and the weakest form of evidence. Some small ideas are viral in that they're instantly and obviously recognized as valuable. They can be wrapped in tidy bundles and easily sandboxed with a low cost of learning and application. They pose little adoption risk, and the decision to adopt tends to be reversible.
Medium-sized ideas, such as in-process unit testing, with larger bundles are composed of multiple concepts with interdependencies. They are more difficult to sandbox and incur modest learning and application costs. Part of the adoption costs might be irreversible. Such ideas pose an adoption risk commensurate with the irreversible portion of the underlying adoption costs.
Large ideas, such as model-driven development, with large, messy bundles are most risky. Adoption costs might be substantial and largely irreversible. Therefore they warrant the strongest form of evidence, which unfortunately is also the most difficult to obtain and becomes available late in the game.
The adoption context is a strategic factor that can be partially controlled depending on other factors, which are intrinsic. Adoption context has two dimensions: scale and rate. The scale of adoption is the number of instances in which a new idea will be applied in a specific organizational situation. It answers the question: how widespread will the idea's adoption be?
The rate of adoption is the speed at which the idea will be spread and applied in that situation. Gradual adoption might hedge high adoption risks if the underlying evidence is weak, the recurring application costs are high, and the components of the idea's bundle can be incrementally applied. This would allow the existing solution to remain in effect while more experience is be ing gained with the replacement solution and until any major uncertainties have been resolved. Gradual adoption provides a chance to preempt an idea's spread if problems materialize early. Rapid adoption might make more sense if evidence is strong, one-time learning costs dominate, and recurrent application costs are relatively low.
Let's put this all together. Table 1
summarizes the roles of different types of evidence under various situations. The risk-taking attitudes of the decision makers shift the rows left (as risky behavior is increasingly prevalent) or right (as risky behavior is increasingly avoided).
Table 1. Types of evidence likely to be necessary to overcome the adoption barrier under various situations
In software engineering, evidence tends to be over- or underemphasized, emphasized too early and too indiscriminately or too late and too sparingly. The value of evidence must be gauged carefully in each situation.
For other and complementary points of view, check the January/February 2005 issue of IEEE Software. In "Evidence-Based Software Engineering for Practitioners," Tore Dybå, Barbara A. Kitchenham, and Magne J⊘rgensen make a case for systematic evidence and how practitioners can leverage it for higher-quality adoption decisions. In "Soup or Art? The Role of Evidential Force in Empirical Software Engineering," Shari Lawrence Pfleeger lashes out at common wisdom and vendor hype when they substitute for solid evidence, and gives examples of usage from other disciplines regarding different types of evidence. Finally, for the latest on evidence in software engineering, don't forget to check Greg Wilson's forthcoming book What Really Works (O'Reilly, 2010).