|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)
Unsupervised Learning of Tree Alignment Models for Information Extraction
Hong Kong, China
December 18-December 22
ISBN: 0-7695-2702-7
| ASCII Text | x | ||
| Philip Zigoris, Damian Eads, Yi Zhang, "Unsupervised Learning of Tree Alignment Models for Information Extraction," 2012 IEEE 12th International Conference on Data Mining Workshops, pp. 45-49, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), 2006. | |||
| BibTex | x | ||
| @article{ 10.1109/ICDMW.2006.166, author = {Philip Zigoris and Damian Eads and Yi Zhang}, title = {Unsupervised Learning of Tree Alignment Models for Information Extraction}, journal ={2012 IEEE 12th International Conference on Data Mining Workshops}, volume = {0}, year = {2006}, isbn = {0-7695-2702-7}, pages = {45-49}, doi = {http://doi.ieeecomputersociety.org/10.1109/ICDMW.2006.166}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - 2012 IEEE 12th International Conference on Data Mining Workshops TI - Unsupervised Learning of Tree Alignment Models for Information Extraction SN - 0-7695-2702-7 SP45 EP49 A1 - Philip Zigoris, A1 - Damian Eads, A1 - Yi Zhang, PY - 2006 KW - null VL - 0 JA - 2012 IEEE 12th International Conference on Data Mining Workshops ER - | |||
We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table? a data structure that better lends itself to high-level data mining and information exploitation. Our algorithm effectively combines tree and string alignment algorithms, as well as domain-specific feature extraction to match semantically related data across search results. The applications of our approach are vast and include hidden web crawling, semantic tagging, and federated search. We build on earlier research on the use of tree alignment for information extraction. In contrast to previous approaches that rely on hand tuned parameters, our algorithm makes use of a variant of Support VectorMachines (SVMs) to learn a parameterized, site-independent tree alignment model. This model can then be used to deduce common structural and textual elements of a set of HTML parse trees. We report some preliminary results of our system?s performance on data from websites with a variety of different layouts.
Citation:
Philip Zigoris, Damian Eads, Yi Zhang, "Unsupervised Learning of Tree Alignment Models for Information Extraction," icdmw, pp.45-49, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), 2006
Usage of this product signifies your acceptance of the Terms of Use.
