This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 IEEE 24th International Conference on Data Engineering
An Algebraic Approach to Rule-Based Information Extraction
Cancun, Mexico
April 07-April 12
ISBN: 978-1-4244-1836-7
Frederick Reiss, IBM Almaden Research Center, San Jose, CA, USA. frreiss@us.ibm.com
Sriram Raghavan, IBM Almaden Research Center, San Jose, CA, USA. rsriram@us.ibm.com
Rajasekar Krishnamurthy, IBM Almaden Research Center, San Jose, CA, USA. rajase@us.ibm.com
Huaiyu Zhu, IBM Almaden Research Center, San Jose, CA, USA. huaiyu@us.ibm.com
Shivakumar Vaithyanathan, IBM Almaden Research Center, San Jose, CA, USA. shiv@us.ibm.com
Traditional approaches to rule-based information extraction (IE) have primarily been based on regular expression grammars. However, these grammar-based systems have difficulty scaling to large data sets and large numbers of rules. Inspired by traditional database research, we propose an algebraic approach to rule-based IE that addresses these scalability issues through query optimization. The operators of our algebra are motivated by our experience in building several rule-based extraction programs over diverse data sets. We present the operators of our algebra and propose several optimization strategies motivated by the text-specific characteristics of our operators. Finally we validate the potential benefits of our approach by extensive experiments over real-world blog data.
Citation:
Frederick Reiss, Sriram Raghavan, Rajasekar Krishnamurthy, Huaiyu Zhu, Shivakumar Vaithyanathan, "An Algebraic Approach to Rule-Based Information Extraction," icde, pp.933-942, 2008 IEEE 24th International Conference on Data Engineering, 2008
Usage of this product signifies your acceptance of the Terms of Use.