This Article 
 Bibliographic References 
 Add to: 
An Empirical Analysis of C Preprocessor Use
December 2002 (vol. 28 no. 12)
pp. 1146-1170

Abstract—This is the first empirical study of the use of the C macro preprocessor, Cpp. To determine how the preprocessor is used in practice, this paper analyzes 26 packages comprising 1.4 million lines of publicly available C code. We determine the incidence of C preprocessor usage—whether in macro definitions, macro uses, or dependences upon macros—that is complex, potentially problematic, or inexpressible in terms of other C or C++ language features. We taxonomize these various aspects of preprocessor use and particularly note data that are material to the development of tools for C or C++, including translating from C to C++ to reduce preprocessor usage. Our results show that, while most Cpp usage follows fairly simple patterns, an effective program analysis tool must address the preprocessor. The intimate connection between the C programming language and Cpp, and Cpp's unstructured transformations of token streams often hinder both programmer understanding of C programs and tools built to engineer C programs, such as compilers, debuggers, call graph extractors, and translators. Most tools make no attempt to analyze macro usage, but simply preprocess their input, which results in a number of negative consequences; an analysis that takes Cpp into account is preferable, but building such tools requires an understanding of actual usage. Differences between the semantics of Cpp and those of C can lead to subtle bugs stemming from the use of the preprocessor, but there are no previous reports of the prevalence of such errors. Use of C++ can reduce some preprocessor usage, but such usage has not been previously measured. Our data and analyses shed light on these issues and others related to practical understanding or manipulation of real C programs. The results are of interest to language designers, tool writers, programmers, and software engineers.

[1] D. Atkins, T. Ball, T. Graves, and A. Mockus, “Using Version Control Data to Evaluate the Effectiveness of Software Tools,” Proc. Int'l Conf. Software Eng., pp. 324–333, 1999.
[2] G. Badros and D. Notkin, “A Framework for Preprocessor-Aware C Source Code Analyses,” Software—Practice and Experience, vol. 30, no. 8, pp. 907–924, 2000.
[3] L.W. Cannon, R.A. Elliott, L.W. Kirchoff, J.H. Miller, R.W. Mitze, E.P. Schan, N.O. Whittington, H. Spencer, D. Keppel, and M. Brader, Recommended C Style and Coding Standards, 6.0 ed., 1990.
[4] M.D. Carroll and M.A. Ellis, Designing and Coding Reusable C++. Reading, Mass.: Addison-Wesley, 1995.
[5] B.J. Cox and A.J. Novobilski, Object Oriented Programming: An Evolutionary Approach. Reading, Mass.: Addison-Wesley, 1991.
[6] J.S. Davis, M.J. Davis, and M.M. Law, “Comparison of Subjective Entropy and User Estimates of Software Complexity,” Empirical Foundations of Information and Software Science, 1990.
[7] P. Deutsch, “ansi2knr,” ghostscript distribution from Aladdin Enterprises,, Dec. 1990.
[8] A. Dolenc, D. Keppel, and G.V. Reilly, Notes on Writing Portable Programs in C, eighth revision. Nov. 1990, cportcport.htm.
[9] Ellemtel Telecommunication Systems Laboratory, “Programming in C++: Rules and Recommendations,” technical report, Ellemtel Telecomm., 1992.
[10] D. Evans, J. Guttag, J. Horning, and Y.M. Tan, “LCLint: A Tool for Using Specifications to Check Code,” Proc. Second ACM SIGSOFT Symp. the Foundations of Software Eng. (SIGSOFT '94), pp. 87–97, Dec. 1994.
[11] D. Evans, LCLint User's Guide, Aug. 1996, http://lclint.cs.virginia.eduguide/.
[12] J.-M. Favre, “Preprocessors from an Abstract Point of View,” Proc. Int'l Conf. Software Maintenance (ICSM '96), Nov. 1996.
[13] Gimpel Software, “PC-lint/FlexeLint,” http://www.gimpel.comlintinfo.htm, 1999.
[14] GNU Project, GNU C Preprocessor Manual, version 2.7.2. 1996.
[15] W.G. Griswold, D.C. Atkinson, and C. McCurdy, “Fast, Flexible Syntactic Pattern Matching and Processing,” Proc. IEEE 1996 Workshop Program Comprehension, Mar. 1996.
[16] S.P. Harbison and G.L. Steele Jr., C: A Reference Manual, fourth ed. Englewood Cliffs, N.J.: Prentice Hall, 1995.
[17] S.C. Johnson, “Lint, a C Program Checker,” Computing Science Technical Report 65, Bell Labs, Murray Hill, N.J., Sept. 1977.
[18] R. Kelsey, W. Clinger, and J.A. Rees, “The Revised5Report on the Algorithmic Language Scheme,” ACM SIGPLAN Notices, vol. 33, no. 9, pp. 26–76, Sept. 1998.
[19] B. W. Kernighan and D. M. Ritchie,The C Programming Language. Englewood Cliffs, NJ: Prentice-Hall, 1988, 2nd ed.
[20] G. Kiczales, J. des Rivières, and D. Bobrow, The Art of the Metaobject Protocol. MIT Press, 1991.
[21] E. Kohlbecker, D.P. Friedman, M. Felleisen, and B. Duba, “Hygienic Macro Expansion,” Proc. ACM Conf. LISP and Functional Programming, R.P. Gabriel, ed., pp. 151–181, Aug. 1986.
[22] M. Krone and G. Snelting, “On The Inference of Configuration Structures from Source Code,” Proc. 16th Int'l Conf. Software Eng., pp. 49-58, May 1994.
[23] P.E. Livadas and D.T. Small, “Understanding Code Containing Preprocessor Constructs,” Proc. IEEE Third Workshop Program Comprehension, pp. 89–97, Nov. 1994.
[24] C. Lott, “Metrics Collection Tools for C and C++ Source Code,”, 1998.
[25] S. Meyers and M. Klaus, “Examining C++ Program Analyzers,” Dr. Dobb's J., vol. 22, no. 2, pp. 68, 70–2, 74–5, 87, Feb. 1997.
[26] G. Murphy, D. Notkin, and E.-C. Lan, "An Empirical Study of Static Call Graph Extractors," The 18th Int'l Conf. Software Eng., pp. 90-99, 1996.
[27] S. Paul and A. Prakash, "A Framework for Source Code Search Using Program Patterns," IEEE Trans. Software Eng., vol. 20, no. 6, pp. 463-474, June 1994.
[28] D.J. Salomon, “Using Partial Evaluation in Support of Portability, Reusability, and Maintainability,” Proc. Compiler Construction, Sixth Int'l Conf., T. Gyimothy, ed., pp. 208–222, Apr. 1996.
[29] M. Siff and T. Reps, “Program Generalization for Software Reuse: From C to C++,” Proc. Fourth ACM SIGSOFT Symp. Foundations of Software Eng., pp. 135-146, San Francisco, Oct. 1996.
[30] H. Spencer and G. Collyer, “#ifdef Considered Harmful, or Portability Experience with C News,” Proc. Usenix Summer 1992 Technical Conf., pp. 185–197, June 1992.
[31] D.A. Spuler and A.S.M. Sajeev, “Static Detection of Preprocessor Macro Errors in C,” Technical Report 92/7, James Cook Univ., Townsville, Australia, 1992.
[32] M. Stachowiak and G.J. Badros, Scwm Reference Manual: The Authoritative Guide to the Emacs of Window Managers, 1999,
[33] R. Stallman, GNU Emacs Manual, 10th ed. Cambridge, Mass.: Free Software Foundation, July 1994.
[34] R.M. Stallman, Using and Porting GNU CC, version 2.7.2. Boston, Mass.: Free Software Foundation, June 1996.
[35] R. Stallman, GNU Coding Standards. GNU Project, July 1997, .
[36] G.L. Steele, Common Lisp: The Language, second ed. Digital Press, 1990.
[37] B. Stroustrup, The Design and Evolution of C++. Addison Wesley, 1994.
[38] D. Weise and R. Crew, “Programmable Syntax Macros,” Proc. PLDI'93, pp. 156–165, Albuquerque, New Mexico, June 1993.
[39] P.T. Zellweger, “An Interactive High-Level Debugger for Control-Flow Optimized Programs,” Technical Report CSL-83-1, Xerox Palo Alto Research Center, Palo Alto, Calif., Jan. 1983.
[40] G.K. Zipf, Human Behavior and the Principle of Least Effort. Cambridge, Mass.: Addison-Wesley, 1949.

Index Terms:
C preprocessor, Cpp, C, C++, macro, macro substitution, file inclusion, conditional compilation, empirical study, program understanding.
Michael D. Ernst, Greg J. Badros, David Notkin, "An Empirical Analysis of C Preprocessor Use," IEEE Transactions on Software Engineering, vol. 28, no. 12, pp. 1146-1170, Dec. 2002, doi:10.1109/TSE.2002.1158288
Usage of this product signifies your acceptance of the Terms of Use.