Issue No. 02 - Feb. (2013 vol. 25)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.178
Zhixu Li , The University of Queensland, Brisbane
Laurianne Sitbon , Queensland University of Technology, Brisbane
Liwei Wang , Wuhan University, Wuhan
Xiaofang Zhou , The University of Queensland, Brisbane
Xiaoyong Du , Renmin University of China, Beijing
In this paper, we propose a new type of Dictionary-based Entity Recognition Problem, named Approximate Membership Localization (AML). The popular Approximate Membership Extraction (AME) provides a full coverage to the true matched substrings from a given document, but many redundancies cause a low efficiency of the AME process and deteriorate the performance of real-world applications using the extracted substrings. The AML problem targets at locating nonoverlapped substrings which is a better approximation to the true matched substrings without generating overlapped redundancies. In order to perform AML efficiently, we propose the optimized algorithm P-Prune that prunes a large part of overlapped redundant matched substrings before generating them. Our study using several real-word data sets demonstrates the efficiency of P-Prune over a baseline method. We also study the AML in application to a proposed web-based join framework scenario which is a search-based approach joining two tables using dictionary-based entity recognition from web documents. The results not only prove the advantage of AML over AME, but also demonstrate the effectiveness of our search-based approach.
Dictionaries, Redundancy, Approximation methods, Approximation algorithms, Correlation, Web search, Pattern matching, AML, Web-based join, approximate membership location
X. Zhou, X. Du, L. Wang, L. Sitbon and Z. Li, "AML: Efficient Approximate Membership Localization within a Web-Based Join Framework," in IEEE Transactions on Knowledge & Data Engineering, vol. 25, no. , pp. 298-310, 2013.