|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Yanhong Zhai, Bing Liu, "Structured Data Extraction from the Web Based on Partial Tree Alignment," IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 12, pp. 1614-1628, December, 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/TKDE.2006.197, author = {Yanhong Zhai and Bing Liu}, title = {Structured Data Extraction from the Web Based on Partial Tree Alignment}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {18}, number = {12}, issn = {1041-4347}, year = {2006}, pages = {1614-1628}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2006.197}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - Structured Data Extraction from the Web Based on Partial Tree Alignment IS - 12 SN - 1041-4347 SP1614 EP1628 EPD - 1614-1628 A1 - Yanhong Zhai, A1 - Bing Liu, PY - 2006 KW - Web data extraction KW - wrapper generation KW - partial tree alignement KW - Web mining. VL - 18 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
[1] A. Arasu and H. Garcia-Molina, “Extracting Structured Data from Web Pages,” Proc. 2003 ACM SIGMOD Int'l Conf. Management of Data, pp. 337-348, 2003.
[2] G.J. Barton and M.J. Sternberg, “A Strategy for the Rapid Multiple Alignment of Protein Sequences: Confidence Levels from Tertiary Structure Comparisons,” J. Molecular Biology, vol. 198, no. 2, pp.327-337, 1987.
[3] R. Baumgartner, S. Flesca, and G. Gottlob, “Visual Web Information Extraction with Lixto,” Proc. 27th Int'l Conf. Very Large Data Bases, pp. 119-128, 2001.
[4] D. Buttler, L. Liu, and C. Pu, “A Fully Automated Object Extraction System for the World Wide Web,” Proc. 21st Int'l Conf. Distributed Computing Systems, pp. 361-370, 2001.
[5] H. Carrillo and D. Lipman, “The Multiple Sequence Alignment Problem in Biology,” SIAM J. Applied Math., vol. 48, no. 5, pp.1073-1082, 1988.
[6] C. Chang and S. Lui, “IEPAD: Information Extraction Based on Pattern Discovery,” Proc. 10th Int'l Conf. World Wide Web, 2001.
[7] W. Chen, “New Algorithm for Ordered Tree-to-Tree Correction Problem,” J. Algorithms, vol 40, no. 2, pp. 135-158, 2001.
[8] W.W. Cohen, M. Hurst, and L.S. Jensen, “A Flexible Learning System for Wrapping Tables and Lists in HTML Documents,” Proc. 11th Int'l Conf. World Wide Web, pp. 232-241, 2002.
[9] V. Crescenzi, G. Mecca, and P. Merialdo, “Roadrunner: Towards Automatic Data Extraction from Large Web Sites,” Proc. 27th Int'l Conf. Very Large Data Bases, pp. 109-118, 2001.
[10] D.W. Embley, Y. Jiang, and Y.K. Ng, “Record-Boundary Discovery in Web Documents,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 467-478, 1999.
[11] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of P-Completeness. W.H. Freeman, 1979.
[12] G.H. Gonnet and R.B. Yates, Handbook of Algorithms and Data Structures in Pascal and C. Addison-Wesley, 1991.
[13] C.M. Hoffmann and M.J. O'Donnell, “Pattern Matching in Trees,” J. ACM, pp. 68-95, 1982.
[14] P. Hogeweg and B. Hesper, “The Alignment of Sets of Sequences and the Construction of Phylogenetic Trees: An Integrated Method,” J. Molecular Evolution, vol. 20, pp. 175-186, 1984.
[15] A. Hogue and D. Karger, “Thresher: Automating the Unwrapping of Semantic Content from the World Wide Web,” Proc. 14th Int'l Conf. World Wide Web, 2005.
[16] C.N. Hsu and M.T. Dung, “Generating Finite-State Transducers for Semistructured Data Extraction from the Web,” Information Systems, vol. 23, no. 9, pp. 521-538, 1998.
[17] T. Jiang, L. Wang, and K. Zhang, “Alignment of Trees—An Alternative to Tree Edit,” CPM '94: Proc. Fifth Ann. Symp. Combinatorial Pattern Matching, pp. 75-86, 1994.
[18] N. Kushmerick, “Wrapper Induction: Efficiency and Expressiveness,” Artificial Intelligence, nos. 1-2, pp. 15-68, 2000.
[19] A.H.F. Laender, B.-R. Neto, and A.S. da Silva, “Debye—Date Extraction by Example,” Data and Knowledge Eng., vol. 40, no. 2, pp. 121-154, 2002.
[20] L. Arllota, V. Crescenzi, G. Mecca, and P. Merialdo, “Automatic Annotation of Data Extraction from Large Web Sitess,” Proc. Int'l Workshop Web and Databases, pp. 7-12, 2003.
[21] K. Lerman, L. Getoor, S. Minton, and C. Knoblock, “Using the Structure of Web Sites for Automatic Segmentation of Tables,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 119-130, 2004.
[22] B. Liu, R. Grossman, and Y. Zhai, “Mining Data Records in Web Pages,” Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 601-606, 2003.
[23] I. Muslea, S. Minton, and C. Knoblock, “A Hierarchical Approach to Wrapper Induction,” Proc. Third Ann. Conf. Autonomous Agents, pp. 190-197, 1999.
[24] C. Notredame, “Recent Progresses in Multiple Sequence Alignment: A Survey,” technical report, Information Génétique et, 2002.
[25] D. Pinto, A. McCallum, X. Wei, and W.-B. Croft, “Table Extraction Using Conditional Random Fields,” Proc. 26th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pp.235-242, 2003.
[26] J. Raposo, A. Pan, M. Alvarez, J. Hidalgo, and A. Vina, “The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes,” Proc. 13th Int'l Workshop Database and Expert Systems Applications, pp. 313-320, 2002.
[27] D.C. Reis, P.B. Golgher, A.S. Silva, and A.F. Laender, “Automatic Web News Extraction Using Tree Edit Distance,” Proc. 13th Int'l Conf. World Wide Web, pp. 502-511, 2004.
[28] B. Rosenfeld, R. Feldman, and Y. Aumann, “Structural Extraction from Visual Layout of Documents,” Proc. 11th Int'l Conf. Information and Knowledge Management, pp. 203-210, 2002.
[29] M.S. Selkow, “The Tree-to-Tree Editing Problem,” Information Processing Letters, vol. 6, no. 6, pp. 184-186, 1977.
[30] R. Song, H. Liu, J.R. Wen, and W.Y. Ma, “Learning Block Importance Models for Web Pages,” Proc. 13th Int'l Conf. World Wide Web, pp. 203-211, 2004.
[31] K.C. Tai, “The Tree-to-Tree Correction Problem,” J. ACM, vol. 26, no. 3, pp. 422-433, 1979.
[32] E. Tanaka and K. Tanaka, “The Tree-to-Tree Editing Problem,” Int'l J. Pattern Recognition and Artificial Intelligence, pp. 221-240, 1988.
[33] G. Valiente, “An Efficient Bottom-Up Distance between Trees,” Proc. Eighth Int'l Symp. String Processing and Information Retrieval, pp. 212-219, 2001.
[34] J. Wang and F.H. Lochovsky, “Data Extraction and Label Assignment for Web Databases.,” Proc. 12th Int'l Conf. World Wide Web, pp 187-196, 2003.
[35] W. Yang, “Identifying Syntactic Differences between Two Programs,” Software—Practice and Experience, vol. 21, no. 7, pp. 739-755, 1991.
[36] Y. Zhai and B. Liu, “Web Data Extraction Based on Partial Tree Alignment,” Proc. 14th Int'l Conf. World Wide Web, pp. 76-85, 2005.
[37] Y. Zhai and B. Liu, “Extracting Web Data Using Instance-Based Learning,” Proc. Sixth Int'l Conf. Web Information Systems Eng., 2005.
[38] K. Zhang, R. Statman, and D. Shasha, “On the Editing Distance between Unordered Labeled Trees,” Information Processing Letters, vol. 42, no. 3, pp 133-139, 1992.
[39] H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu, “Fully Automatic Wrapper Generation for Search Engines,” Proc. 14th Int'l Conf. World Wide Web, pp. 66-75, 2005.
[40] L. Zhao and N.K. Wee, “WICCAP: From Semi-Structured Data to Structured Data,” Proc. 11th IEEE Int'l Conf. and Workshop Eng. of Computer-Based Systems (ECBS '04), p. 86, 2004.

