This Article 
 Bibliographic References 
 Add to: 
A Query Algebra for Program Databases
March 1996 (vol. 22 no. 3)
pp. 202-217

Abstract—Querying source code is an essential aspect of a variety of software engineering tasks such as program understanding, reverse engineering, program structure analysis, and program flow analysis. In this paper, we present and demonstrate the use of an algebraic source code query technique that blends expressive power with query compactness. The query framework of Source Code Algebra, or SCA, permits users to express complex source code queries and views as algebraic expressions. Queries are expressed on an extensible, object-oriented database that stores program source code. The SCA algebraic approach offers multiple benefits such as an applicative query language, high expressive power, seamless handling of structural and flow information, clean formalism, and potential for query optimization. We present a case study where SCA expressions are used to query a program in terms of program organization, resource flow, control flow, metrics, and syntactic structure. Our experience with an SCA-based prototype query processor indicates that an algebraic approach to source code queries combines the benefits of expressive power and compact query formulation.

[1] R. Al-Zoubi and A. Prakash, "Program view generation and change analysis using attributed dependency graphs," Software Maintenance: Research and Practice, vol. 7, no. 4, pp. 239-261, July-Aug. 1995.
[2] M. Atkinson et al., "The object-oriented database system manifesto," Technical Report ALTAIR TR 30-89, GIP ALTAIR, Le Chesnay, France, 1989.
[3] T. Biggerstaff, B.G. Mitbander, and D. Webster, "The concept assignment problem in program understanding," Proc. 15th Int'l Conf. Software Engineering, pp. 482-498, 1993.
[4] R. Brooks, "Towards a theory of comprehension of computer programs," Int'l J. Man Machine Studies, vol. 18, pp. 543-554, 1983.
[5] K.B. Bruce and P. Wegner,“An algebraic model of subtype and inheritance,” F. Bancilhon and P. Buneman, eds., Advances in Database Programming Languages.Reading, Mass.: ACM Press, Frontier Series, pp. 75-96, 1990.
[6] R.G.G. Cattell ed., The Object Database Standard: ODMG-93.San Mateo, Calif: Morgan Kaufmann, 1994.
[7] Y. Chen, M. Nishimito, and C. Ramamoorthy, "C Information Abstraction System," IEEE Trans. Software Eng., vol. 16, no. 3, pp. 325-334, Mar. 1990.
[8] J.M. Cheng, N.M. Mattos, D.D. Chamberlin, and L.G. DeMichiel, "Extending relational technology for new applications," J. IBM Systems, vol. 33, no. 2, pp. 266-279, 1994.
[9] E.J. Chikofsky and J.H. Cross II, "Reverse Engineering and Design Recovery: A Taxonomy," IEEE Software, Vol. 7, No. 1, Jan./Feb. 1990, pp. 13-17.
[10] L. Cleveland, "A program understanding support environment," J. IBM Systems, vol. 28, no. 2, pp. 324-344, 1989.
[11] E.F. Codd,“A relational model of data for large shared data banks,” Comm. ACM, vol. 13, no. 6, June 1970.
[12] M. Consens, A. Mendelzon, and A. Ryman, "Modeling and querying software structures," Proc. Int'l Conf. Software Engineering, pp. 138-156, 1992.
[13] T.A. Corbi,“Program understanding: Challenge for the 1990s,” IBM Systems J., vol. 28, no. 2, pp. 294-306, 1989.
[14] P. Devanbu, "GENOA—A Customizable, Language and Front-End Independent Code Analyzer," Proc. 14th Int'l Conf. Software Eng., May 1992.
[15] R. Gupta and M.L. Soffa, "A framework for partial data flow analysis," Proc. Int'l Conf. Software Maintenance, pp. 4-13, Sept. 1994.
[16] S. Horwitz, "Adding relational query facilities to software development environments," Theoretical Computer Science, vol. 73, pp. 213-230, 1990.
[17] M.A. Linton, "Implementing Relational Views of Programs," Proc. ACM SIGSOFT/SIGPLAN Software Eng. Symp. Practical Software Development Environments,Pittsburgh, pp. 65-72, Apr. 1984.
[18] F. Manola and U. Dayal, "PDM: An object-oriented data model," Proc. Int'l Workshop on Object-oriented Database Systems, pp. 18-25, Sept. 1986.
[19] J. Melton ed., Database Language SQL3 (Working Draft). ANSI Database Committee (X3H2), American National Standards Institute, New York, Sept. 1993.
[20] H.A. Muller, M.A Orgun, S.R. Tilley, and J.S. Uhl, "A reverse engineering approach to subsystem structure identification," Software Maintenance: Research and Practice, vol. 5, no. 4, pp. 181-204, Dec. 1993.
[21] S.L. Osborn, "Identity, equality and query optimization," 2nd Int'l. Workshop Object-oriented Database Systems, pp. 346-351, Springer-Verlag, Sept. 1988.
[22] S. Paul, "Design and implementation of query languages for Program databases," PhD thesis, Univ. of Michigan, 1995.
[23] S. Paul and A. Prakash, "A Framework for Source Code Search Using Program Patterns," IEEE Trans. Software Eng., vol. 20, no. 6, pp. 463-474, June 1994.
[24] S. Paul and A. Prakash, "Supporting queries on source code: A formal framework," Int'l J. Software Engineering and Knowledge Engineering, Special Issue on Reverse Engineering., pp. 325-348, Sept. 1994.
[25] Reasoning Systems, Palo Alto, Calif., REFINE User's Guide, 1989.
[26] Reasoning Systems, Palo Alto, Calif. REFINE/C Programmer's Guide, ch. 5, 1992.
[27] C. Rich and R. Waters, The Programmer's Apprentice. ACM Press, 1990.
[28] H.J. Schek and M.H. Scholl, "An algebra for the relational model with relation-valued attributes," Information Systems, vol. 11, pp. 137-147, 1986.
[29] G.M. Shaw and S.B. Zdonik, "An object-oriented query algebra," Bulletin IEEE technical committee on Data Engineering, vol. 12, no. 3, pp. 29-36, 1989.
[30] M. Stonebraker and G. Kemnitz,"The POSTGRES next-generation database management system," Comm. ACM, vol. 34, no. 10, pp. 78-92, Oct. 1991.
[31] D.D. Straube and M.T. Ozsu, “Queries and Query Processing in Object-Oriented Database Systems,” ACM Trans. Information Systems, vol. 8, no. 4, 1990.
[32] J.D. Ullman, Principles of Database Systems.Rockville, Md.: Computer Science Press, 1982.
[33] M. Weiser, "Program Slicing," IEEE Trans. Software Engineering, vol. 10, no. 4, pp. 352-357, Jun. 1984.
[34] E. Yourdon, "RE-3," American Programmer, vol. 2, no. 4, pp. 3-10, Apr. 1989.

Index Terms:
Software reverse engineering, program understanding, source code analysis, program query language, query algebra.
Santanu Paul, Ataul Prakash, "A Query Algebra for Program Databases," IEEE Transactions on Software Engineering, vol. 22, no. 3, pp. 202-217, March 1996, doi:10.1109/32.489080
Usage of this product signifies your acceptance of the Terms of Use.