This Article 
 Bibliographic References 
 Add to: 
A Query Formulation Language for the Data Web
May 2012 (vol. 24 no. 5)
pp. 783-798
Mustafa Jarrar, Birzeit University, Birzeit
Marios D. Dikaiakos, University of Cyprus, Niosia
We present a query formulation language (called MashQL) in order to easily query and fuse structured data on the web. The main novelty of MashQL is that it allows people with limited IT skills to explore and query one (or multiple) data sources without prior knowledge about the schema, structure, vocabulary, or any technical details of these sources. More importantly, to be robust and cover most cases in practice, we do not assume that a data source should have—an offline or inline—schema. This poses several language-design and performance complexities that we fundamentally tackle. To illustrate the query formulation power of MashQL, and without loss of generality, we chose the Data web scenario. We also chose querying RDF, as it is the most primitive data model; hence, MashQL can be similarly used for querying relational databases and XML. We present two implementations of MashQL, an online mashup editor, and a Firefox add on. The former illustrates how MashQL can be used to query and mash up the Data web as simple as filtering and piping web feeds; and the Firefox add on illustrates using the browser as a web composer rather than only a navigator. To end, we evaluate MashQL on querying two data sets, DBLP and DBPedia, and show that our indexing techniques allow instant user interaction.

[1] Altova XMLSpy, , Feb. 2010.
[2] Stylus Studio, http://www.stylusstudio.comxquery_editor. html , Feb. 2010.
[3] Isparql, http://lod.openlinksw.comisparql, Feb. 2010.
[4] RDB2RDF,, Feb. 2010.
[5] SPARQL Extensions,, Feb. 2010.
[6] SparqlMotion, http://www.topquadrant.comsparqlmotion, Feb. 2010.
[7] Yahoo Pipes,, Feb. 2010.
[8] D. Abadi, A. Marcus, S. Madden, and K. Hollenbach, "Scalable Semantic Web Data Management Using Vertical Partitioning," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2007.
[9] N. Athanasis, V. Christophides, and D. Kotzinos, "Generating On the Fly Queries for the Semantic Web," Proc. Int'l Semantic Web Conf. (ISWC '04), 2004.
[10] BEA Systems, Inc., "BEA AquaLogic Data Services Platform -XQuery Developer's Guide. Version 2.5," 2005.
[11] A. Bloesch and T. Halpin, "Conceptual Queries Using ConQuer-II," Proc. Int'l Conf. Conceptual Modeling (ER), 1997.
[12] S. Comai and E. Damiani, "Computing Graphical Queries over XML Data," ACM Trans. Information Systems, vol. 19, no. 4, pp. 371-430, 2001.
[13] E. Chong, S. Das, G. Eadon, and J. Srinivasan, "An Efficient SQL-Based RDF Querying Scheme," Proc. Int'l Conf. Very Large Data Bases (VLDB '05), 2005.
[14] B. Czejdo, R. Elmasri, M. Rusinkiewicz, and D. Embley, "An Algebraic Language for Graphical Query Formulation Using an EER Model," Proc. Computer Science Conf., 1987.
[15] F. De Keukelaere, S. Bhola, M. Steiner, S. Chari, and S. Yoshihama, "SMash: Secure Component Model for Cross-Domain Mashups on Unmodified Browsers," Proc. Int'l Conf. World Wide Web (WWW), 2008.
[16] O. De Troyer, R. Meersman, and P. Verlinden, "RIDL on the CRIS Case: A Workbench for NIAM," Proc. IFIP WG 8.1 Working Conf. Computerized Assistance during the Information Systems Life Cycle, 1988.
[17] J. Dionisiof and A. Cardenasf, "MQuery: A Visual Query Language for Multimedia, Timeline and Simulation Data," J. Visual Languages and Computing, vol. 7, no. 4, pp. 377-401, 1996.
[18] R. Goldman and J. Widom, "DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), 1997.
[19] E. Griffin, Foundations of Popfly. Springer, 2008.
[20] R. Henzinger, A. Henzinger, and W. Kopke, "Computing Simulations on Finite and Infinite Graphs," Proc. Ann. Symp. Foundations of Computer Science (FOCS), 1995.
[21] A. Hofstede, H. Proper, and T. Weide, "Computer Supported Query Formulation in an Evolving Context," Proc. Australasian Database Conf., 1995.
[22] M. Jarrar, "Towards Methodological Principles for Ontology Engineering," PhD thesis, Vrije Universiteit Brussel, 2005.
[23] M. Jarrar and M. Dikaiakos, "A Data Mashup Language for the Data Web," Proc. WWW2009 Workshop Linked Data on the Web (LDOW), 2009.
[24] M. Jarrar and M. Dikaiakos, "MashQL: Querying the Data Web," IEEE Internet Computing, vol. 14, no. 3, pp. 58-67, May 2010.
[25] M. Jarrar and M. Dikaiakos, "Querying the Data Web," Technical Report TR-09-04, Dept. of Computer Science, Univ. of Cyprus, TARMD10.pdf.htm, Nov. 2009.
[26] H. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and Y. Cong, "Making Database Systems Usable," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[27] M. Jarrar and M. Dikaiakos, "Querying the Data Web," Univ. of Cyprus,, 2009.
[28] M. Jayapandian and H. Jagadish, "Automated Creation of a Forms-Based Database Query Interface," Proc. VLDB Endowment, vol. 1, no. 1, pp. 695-709, 2008.
[29] M. Jayapandian and H. Jagadish, "Expressive Query Specification through Form Customization," Proc. Int'l Conf. Extending Database Technology: Advances in Database Technology (EDBT), 2008.
[30] E. Kaufmann and A. Bernstein, "How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users," Proc. Int'l Semantic Web and Second Asian Conf. Asian Semantic Web Conf. (ISWC/ASWC), 2007.
[31] R. Kaushik, P. Bohannon, J. Naughton, and H. Korth, "Covering Indexes for Branching Path Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2002.
[32] R. Kaushik, P. Shenoy, P. Bohannon, and E. Gudes, "Exploiting Local Similarity for Indexing of Paths in Graph Structured Data," Proc. Int'l Conf. Data Eng. (ICDE), 2002.
[33] Y. Li, H. Yang, and H. Jagadish, "NaLIX: An Interactive Natural Language Interface for Querying XML," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2005.
[34] A. Manoli, "MashQL Implementation in the Firefox Browser," BSc thesis, Computer Science Dept., Univ. of Cyprus, Dec. 2009.
[35] R. Miller, "Response Time in Man-Computer Conversational Transactions," Proc. Fall Joint Computer Conf., 1968.
[36] T. Milo and D. Suciu, "Index Structures for Path Expressions," Proc. Int'l Conf. Database Theory (ICDT), 1999.
[37] A. Nandi and H. Jagadish, "Assisted Querying Using Instant-Response Interfaces," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
[38] S. Nestorov, J. Ullman, J. Wiener, and S. Chawathe, "Concise Representations of Semistructured Hierarchical Data," Proc. Int'l Conf. Data Eng. (ICDE), 1997.
[39] T. Neumann and G. Weikum, "RDF3X: RISC Style Engine for RDF," Proc. VLDB Endowment, vol. 1, no. 1, pp. 647-659, 2008.
[40] R. Paige and R. Tarjan, "Three Partition Refinement Algorithms," SIAM J. Computing, vol. 16, no. 6, pp. 973-989, 1987.
[41] C. Parent and S. Spaccapietra, "About Complex Entities, Complex Objects and Object-Oriented Data Models," Information System Concepts, North-Holland, 1989.
[42] M. Petropoulos, Y. Papakonstantinouy, and V. Vassalos, "Graphical Query Interfaces for Semistructured Data," ACM Trans. Internet Technology, vol. 5, no. 2, pp. 390-438, 2005.
[43] E. Prud'hommeaux and A. Seaborne, "SPARQL Query Language for RDF," 2008.
[44] A. Popescu, O. Etzioni, and H. Kautz, "Towards a Theory of Natural Language Interfaces to Databases," Proc. Eighth Int'l Conf. Intelligent User Interfaces, 2003.
[45] A. Russell, R. Smart, D. Braines, and R. Shadbolt, "NITELIGHT: A Graphical Tool for Semantic Query Construction," Proc. Semantic Web User Interaction Workshop (SWUI), 2008.
[46] D. Steer, L. Miller, and D. Brickley, "RDFAuthor: Enabling Everyone to Author Rdf," Proc. Int'l World Wide Web Conf. (WWW '02 Developers Day), 2002.
[47] L. Stockmeyer and A. Meyer, "Word Problems Requiring Exponential Time," Proc. Ann. ACM Symp. Theory of Computing (STOC '73), 1973.
[48] G. Tummarello, A. Polleres, and C. Morbidoni, "Who the FOAF Knows Alice?" Proc. Int'l Semantic Web Conf. (ISWC), 2007.
[49] C. Savvides, "MashQL: A Step towards Semantic Pipes," MSc thesis, Computer Science Dept., Univ. of Cyprus, May 2010.
[50] M. Zloof, "Query-by-Example: A Data Base Language," IBM Systems, vol. 16, no. 4, pp. 324-343, 1977.

Index Terms:
Query formulation, semantic\data web, RDF and SPARQL, indexing methods.
Mustafa Jarrar, Marios D. Dikaiakos, "A Query Formulation Language for the Data Web," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 5, pp. 783-798, May 2012, doi:10.1109/TKDE.2011.41
Usage of this product signifies your acceptance of the Terms of Use.