|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| James Caverlee, Ling Liu, "QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 9, pp. 1247-1262, September, 2005. | |||
| BibTex | x | ||
| @article{ 10.1109/TKDE.2005.151, author = {James Caverlee and Ling Liu}, title = {QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {17}, number = {9}, issn = {1041-4347}, year = {2005}, pages = {1247-1262}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.151}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web IS - 9 SN - 1041-4347 SP1247 EP1262 EPD - 1247-1262 A1 - James Caverlee, A1 - Ling Liu, PY - 2005 KW - Index Terms- Deep Web KW - data preparation KW - data extraction KW - pagelets KW - clustering. VL - 17 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
[1] B. Adelberg, “NoDoSEA Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents,” Proc. SIGMOD, 1998.
[2] A. Arasu and H. Garcia-Molina, “Extracting Structured Data from Web Pages,” Proc. SIGMOD, 2003.
[3] R.A. Baeza-Yates and B.A. Ribeiro-Neto, Modern Information Retrieval. ACM Press, 1999.
[4] Z. Bar-Yossef and S. Rajagopalan, “Template Detection via Data Mining and Its Applications,” Proc. World Wide Web Conf., 2002.
[5] D. Beeferman and A. Berger, “Agglomerative Clustering of a Search Engine Query Log,” Knowledge Discovery and Data Mining, pp. 407-416, 2000.
[6] K. Bharat and M.R. Henzinger, “Improved Algorithms for Topic Distillation in a Hyperlinked Environment,” Proc. ACM SIGIR Conf., 1998.
[7] A.Z. Broder, S.C. Glassman, M.S. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Proc. World Wide Web Conf., 1997.
[8] J. Caverlee, L. Liu, and D. Buttler, “Probe, Cluster, and Discover: Focused Extraction of QA-Pagelets from the Deep Web,” Proc. Int'l Conf. Data Eng., 2004.
[9] W. Cohen, “Recognizing Structure in Web Pages Using Similarity Queries,” Proc. Am. Assoc. for Artificial Intelligence Conf., 1999.
[10] I.S. Dhillon and D.S. Modha, “Concept Decompositions for Large Sparse Text Data Using Clustering,” Machine Learning, vol. 42, nos. 1/2, pp. 143-175, 2001.
[11] L. Gravano, P.G. Ipeirotis, and M. Sahami, “QProber: A System for Automatic Classification of Hidden-Web Databases,” ACM Trans. Information Systems, vol. 21, no. 1, pp. 1-41, 2003.
[12] M. Halkidi, Y. Batistakis, and M. Vazirigiannis, “Clustering Validity Checking Methods: Part II,” SIGMOD Record, vol. 31, no. 3, pp. 19-27, 2002.
[13] J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” J. ACM, vol. 46, no. 5, pp. 604-632, 1999.
[14] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “Trawling the Web for Emerging Cyber-Communities,” Proc. World Wide Web Conf., 1999.
[15] L. Liu, C. Pu, and W. Han, “XWRAP: An XML-Enabled Wrapper Construction System for Web Information Sources,” Proc. Int'l Conf. Data Eng., 2000.
[16] Z. Liu, C. Luo, J. Cho, and W. Chu, “A Probabilistic Approach to Metasearching with Adaptive Probing,” Proc. Int'l Conf. Data Eng., 2004.
[17] A. Nierman and H.V. Jagadish, “Evaluating Structural Similarity in XML Documents,” Proc. Fifth Int'l Workshop Web and Databases, 2002.
[18] M.F. Porter, “An Algorithm for Suffix Stripping,” Program, vol. 14, no. 3, pp. 130-137, 1980.
[19] S. Raghavan and H. Garcia-Molina, “Crawling the Hidden Web,” Proc. Very Large Databases Conf., 2001.
[20] C.E. Shannon, “A Mathematical Theory of Communication,” The Bell System Technical J., vol. 27, pp. 379-423, 623-656, July, Oct. 1948.
[21] M. Steinbach, G. Karypis, and V. Kumar, “A Comparison of Document Clustering Techniques,” Proc. KDD Workshop Text Mining, 2000.
[22] J. Wang, J.-R. Wen, F. Lochovsky, and W.-Y. Ma, “Instance-Based Schema Matching for Web Databases by Domain-Specific Query Probing,” Proc. Very Large Databases Conf., 2004.
[23] W. Wu, C.T. Yu, A. Doan, and W. Meng, “An Interactive Clustering-Based Approach to Integrating Source Query Interfaces on the Deep Web,” Proc. SIGMOD, 2004.
[24] O. Zamir and O. Etzioni, “Web Document Clustering: A Feasibility Demonstration,” Proc. SIGIR, 1998.
[25] Z. Zhang, B. He, and K.C. -C. Chang, “Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax,” Proc. SIGMOD, 2004.
[26] Y. Zhao and G. Karypis, “Criterion Functions for Document Clustering: Experiments and Analysis,” technical report, Univ. of Minnesota, Dept. of Computer Science, Minneapolis, 2002.

