|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
2009 International Conference of Soft Computing and Pattern Recognition
DTM - Extracting Data Records from Search Engine Results Page Using Tree Matching Algorithm
Malacca, Malaysia
December 04-December 07
ISBN: 978-0-7695-3879-2
| ASCII Text | x | ||
| Jer Lang Hong, Eugene Siew, Simon Egerton, "DTM - Extracting Data Records from Search Engine Results Page Using Tree Matching Algorithm," Soft Computing and Pattern Recognition, International Conference of, pp. 149-154, 2009 International Conference of Soft Computing and Pattern Recognition, 2009. | |||
| BibTex | x | ||
| @article{ 10.1109/SoCPaR.2009.40, author = {Jer Lang Hong and Eugene Siew and Simon Egerton}, title = {DTM - Extracting Data Records from Search Engine Results Page Using Tree Matching Algorithm}, journal ={Soft Computing and Pattern Recognition, International Conference of}, volume = {0}, year = {2009}, isbn = {978-0-7695-3879-2}, pages = {149-154}, doi = {http://doi.ieeecomputersociety.org/10.1109/SoCPaR.2009.40}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - CONF JO - Soft Computing and Pattern Recognition, International Conference of TI - DTM - Extracting Data Records from Search Engine Results Page Using Tree Matching Algorithm SN - 978-0-7695-3879-2 SP149 EP154 A1 - Jer Lang Hong, A1 - Eugene Siew, A1 - Simon Egerton, PY - 2009 KW - Information Extraction KW - Wrapper Generation KW - Search Engine VL - 0 JA - Soft Computing and Pattern Recognition, International Conference of ER - | |||
In this paper, we develop a non-visual automatic wrapper for extracting data records from search engine results page. The novel techniques for our wrapper are (1) filtering rules to detect and filter out irrelevant data records, (2) a tree matching algorithm using frequency measures to increase the speed of data extraction (3) an algorithm to calculate the number and size of the components of data records to detect the correct data region. Results show that our wrapper is as robust and in many cases outperforms the state of the art wrappers such as ViNT and DEPTA. This wrapper could have significant speed advantages when processing large volumes of web sites data, which could be helpful in meta search engine development.
Index Terms:
Information Extraction, Wrapper Generation, Search Engine
Citation:
Jer Lang Hong, Eugene Siew, Simon Egerton, "DTM - Extracting Data Records from Search Engine Results Page Using Tree Matching Algorithm," socpar, pp.149-154, 2009 International Conference of Soft Computing and Pattern Recognition, 2009
Usage of this product signifies your acceptance of the Terms of Use.
