|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Hung-Yu Kao, Jan-Ming Ho, Ming-Syan Chen, "WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 5, pp. 614-627, May, 2005. | |||
| BibTex | x | ||
| @article{ 10.1109/TKDE.2005.84, author = {Hung-Yu Kao and Jan-Ming Ho and Ming-Syan Chen}, title = {WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {17}, number = {5}, issn = {1041-4347}, year = {2005}, pages = {614-627}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2005.84}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model IS - 5 SN - 1041-4347 SP614 EP627 EPD - 614-627 A1 - Hung-Yu Kao, A1 - Jan-Ming Ho, A1 - Ming-Syan Chen, PY - 2005 KW - Intrapage informative structure KW - DOM KW - entropy KW - information extraction. VL - 17 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
[1] B. Adelberg, “NoDoSE— A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents,” Proc. 1998 ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 1998.
[2] T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa, “Efficient Substructure Discovery from Large Semi-structured Data,” Proc. SIAM Int'l Conf. Data Mining (SDM), 2002.
[3] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addision Wesley, 1999.
[4] Z. Bar-Yossef and S. Rajagopalan, “Template Detection via Data Mining and Its Applications,” Proc. 11th World Wide Web Conf. (WWW), 2002.
[5] A. Broder, S. Glassman, M. Manasse, and G. Zweig, “Syntactic Clustering of the Web,” Proc. Sixth World Wide Web Conf. (WWW), 1997.
[6] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery, “Learning to Construct Knowledge Bases from the World Wide Web,” Artificial Intelligence, vol. 118, nos. 1-2, pp. 69-113, 2000.
[7] S. Chakrabarti, “Integrating the Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction,” Proc. 10th World Wide Web Conf. (WWW), 2001.
[8] Y. Chen, W.-Y. Ma, and H.-J. Zhang, “Detecting Web Page Structure for Adaptive Viewing on Small Form Factor Devices,” Proc. 12th World Wide Web Conf. (WWW), 2003.
[9] W. Cohen, “Recognizing Structure in Web Pages Using Similarity Queries,” Proc. Nat'l Conf. Artificial Intelligence (AAAI), 1999.
[10] G. Cong, L. Yi, B. Liu, and K. Wang, “Discovering Frequent Substructures from Hierarchical Semi-Structured Data,” Proc. SIAM Int'l Conf. Data Mining (SIAM SDM), 2002.
[11] R. Cooley and J. Srivastava, “Web Mining: Information and Pattern Discovery on the World Wide Web,” Proc. Ninth IEEE Int'l Conf. Tools with Artificial Intelligence (ICTAI), 1997.
[12] D.W. Embley, Y. Jiang, and Y.K. Ng, “Record-Boundary Discovery in Web Documents,” Proc. 1999 ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 1999.
[13] K. Furukawa, T. Uchida, K. Yamada, T. Miyahara, T. Shoudai, and Y. Nakamura, “Extracting Characteristic Structures among Words in Semistructured Documents,” Proc. Sixth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2002.
[14] H. Grundel, T. Naphtali, C. Wiech, J.-M. Gluba, M. Rohdenburg, and T. Scheffer, “Clipping and Analyzing News Using Machine Learning Techniques,” Proc. Int'l Conf. Discovery Science, 2001.
[15] C.N. Hsu and M.T. Dung, “Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web,” Information Systems, vol. 23, no. 8, pp. 521-538, 1998.
[16] H.-Y. Kao, S.H. Lin, J.M. Ho, and M.-S. Chen, “Entropy-Based Link Analysis for Mining Web Informative Structures,” Proc. ACM 11th Int'l Conf. Information and Knowledge Management (CIKM), 2002.
[17] H.-Y. Kao, S.-H. Lin, J.-M. Ho, and M.-S. Chen, “Mining Web Information Structures and Contents Based on Entropy Analysis,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 1, Jan. 2004.
[18] J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” Proc. ACM-SIAM Symp. Discrete Algorithms (SODA), 1998.
[19] N. Kushmerick, D. Weld, and R. Doorenbos, “Wrapper Induction for Information Extraction,” Proc. 15th Int'l Joint Conf. Artificial Intelligence (IJCAI), 1997.
[20] A. Laender, B. Ribeiro-Neto, A. Silva, and J. Teixeira, “A Brief Survey of Web Data Extraction Tools,” SIGMOD Record, vol. 31, no. 2, June 2002.
[21] S.H. Lin and J.M. Ho, “Discovering Informative Content Blocks from Web Documents,” Proc. Eighth ACM Int'l Conf. Knowledge Discovery and Data Mining (SIGKDD), 2002.
[22] W.Y. Lin and W. Lam, “Learning to Extract Hierarchical Information from Semi-Structured Documents,” Proc. ACM Ninth Int'l Conf. Information and Knowledge Management (CIKM), 2000.
[23] X. Li, B. Liu, T.-H. Phang, and M. Hu, “Using Micro Information Units for Internet Search,” Proc. ACM 11th Int'l Conf. Information and Knowledge Management (CIKM), 2002.
[24] T. Miyahara, Y. Suzuki, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda, “Discovery of Frequent Tag Tree Patterns in Semistructured Web Documents,” Proc. Sixth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2002.
[25] C.E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical J., vol. 27, pp. 398-403, 1948.
[26] G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley, 1989.
[27] W3C DOM, Document Object Model (DOM), http://www.w3.orgDOM/, 2005.
[28] K. Wang and H. Liu, “Discovering Structural Association of Semistructured Data,” IEEE Trans. Knowledge and Eng., vol. 12, no. 3, May/June 2000.
[29] C. Yip, C. Gertz, and N. Sundaresan, “Reverse Engineering for Web Data: From Visual to Semantic Structures,” Proc. 19th IEEE Int'l Conf. Data Eng. (ICDE), 2002.

