This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Joint Optimization of Index Freshness and Coverage in Real-Time Search Engines
Dec. 2012 (vol. 24 no. 12)
pp. 2203-2217
Yongwook Shin, Seoul National University, Seoul
Junseok Lim, Seoul National University, Seoul
Jonghun Park, Seoul National University, Seoul
Real-time search engines are increasingly indexing web content using data streams, since a number of web sources including news and social media sites are now delivering up-to-date information via streams. Accordingly, it is a crucial challenge for a real-time search engine using data streams to improve index freshness that primarily depends on the latencies involved during fetching and indexing processes. Retrieval latency is a time lag between document publication and fetching while indexing latency is a delay required for a fetched document to be indexed, which is caused by finiteness of indexing capacity. The problem of retrieval latency can be satisfactorily addressed by use of appropriate fetching scheduling or recent real-time content notification protocols. However, as the entire volume of real-time content rapidly grows, the indexing latency becomes a challenging problem. Furthermore, the need for maximizing index coverage makes it more difficult to reduce the indexing latency under the limited indexing capacity. We consider a problem of jointly optimizing the indexing latency as well as index coverage, in which their relative importance can be adjusted, and propose an optimization model based on inventory control theory. Extensive experiments have been conducted to validate the proposed model, and suggest that the proposed approach outperforms the other alternatives.
Index Terms:
Indexing,Erbium,Real time systems,Search engines,Delay,Inventory control,information retrieval,Feed,index freshness,index coverage,real-time search,search engine
Citation:
Yongwook Shin, Junseok Lim, Jonghun Park, "Joint Optimization of Index Freshness and Coverage in Real-Time Search Engines," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 12, pp. 2203-2217, Dec. 2012, doi:10.1109/TKDE.2011.144
Usage of this product signifies your acceptance of the Terms of Use.