Search For:

Displaying 1-46 out of 46 total
Cluster-Based Delta Compression of a Collection of Files
Found in: Web Information Systems Engineering, International Conference on
By Zan Ouyang, Nasir Memon, Torsten Suel, Dimitre Trendafilov
Issue Date:December 2002
pp. 257
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large col...
 
Guest Editors' Introduction: Social Computing in the Blogosphere
Found in: IEEE Internet Computing
By Huan Liu, Philip S. Yu, Nitin Agarwal, Torsten Suel
Issue Date:March 2010
pp. 12-14
The widespread phenomenon of blogging demonstrates the power of citizen journalism and anytime information sharing. People can exchange personal experiences, voice opinions, offer suggestions, and form groups with genuine social activities. Blogs also act ...
 
Optimized Inverted List Assignment in Distributed Search Engine Architectures
Found in: Parallel and Distributed Processing Symposium, International
By Jiangong Zhang, Torsten Suel
Issue Date:March 2007
pp. 41
We study efficient query processing in distributed web search engines with global index organization. The main performance bottleneck in this case is due to the large amount of index data that is exchanged between nodes during the processing of a query, an...
 
Efficient Query Evaluation on Large Textual Collections in a Peer-to-Peer Environment
Found in: Peer-to-Peer Computing, IEEE International Conference on
By Jiangong Zhang, Torsten Suel
Issue Date:September 2005
pp. 225-233
We study the problem of evaluating ranked (top-k) queries on textual collections ranging from multiple giga-bytes to terabytes in size. We focus on the case of a global index organization in a highly distributed environment, and consider a class of ranking...
 
Improved File Synchronization Techniques for Maintaining Large Replicated Collections over Slow Networks
Found in: Data Engineering, International Conference on
By Torsten Suel, Patrick Noel, Dimitre Trendafilov
Issue Date:April 2004
pp. 153
We study the problem of maintaining large replicated collections of files or documents in a distributed environment with limited bandwidth. This problem arises in a number of important applications, such as synchronization of data between accounts or devic...
 
Design and Implementation of a High-Performance Distributed Web Crawler
Found in: Data Engineering, International Conference on
By Vladislav Shkapenyuk, Torsten Suel
Issue Date:March 2002
pp. 0357
Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus i...
 
Compressing the Graph Structure of the Web
Found in: Data Compression Conference
By Torsten Suel, Jun Yuan
Issue Date:March 2001
pp. 0213
Abstract: A large amount of research has recently focused on the graph structure (or link structure) of the World Wide Web. This structure has proven to be extremely useful for improving the performance of search engines and other tools for navigating the ...
 
Portable and Efficient Parallel Computing Using the BSP Model
Found in: IEEE Transactions on Computers
By Mark W. Goudreau, Kevin Lang, Satish B. Rao, Torsten Suel, Thanasis Tsantilas
Issue Date:July 1999
pp. 670-689
<p><b>Abstract</b>—The Bulk-Synchronous Parallel (BSP) model was proposed by Valiant as a standard interface between parallel software and hardware. In theory, the BSP model has been shown to allow the asymptotically optimal execution of ...
 
Highly Portable and Efficient Implementations of Parallel Adaptive N-Body Methods
Found in: SC Conference
By David Blackston, Torsten Suel
Issue Date:November 1997
pp. 4
We describe the design of several portable and efficient parallel implementations of adaptive N-body methods, including the adaptive Fast Multipole Method, the adaptive version of Anderson's Method, and the Barnes-Hut algorithm. Our codes are based on a co...
 
A candidate filtering mechanism for fast top-k query processing on modern cpus
Found in: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13)
By Constantinos Dimopoulos, Sergey Nepomnyachiy, Torsten Suel
Issue Date:July 2013
pp. 723-732
A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that bui...
     
To index or not to index: time-space trade-offs in search engines with positional ranking functions
Found in: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '12)
By Diego Arroyuelo, Mauricio Marin, Mauricio Oyarzún, Senén González, Torsten Suel
Issue Date:August 2012
pp. 255-264
Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about ...
     
Optimizing positional index structures for versioned document collections
Found in: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '12)
By JInru He, Torsten Suel
Issue Date:August 2012
pp. 245-254
Versioned document collections are collections that contain multiple versions of each document. Important examples are Web archives, Wikipedia and other wikis, or source code and documents maintained in revision control systems. Versioned document collecti...
     
Scalable manipulation of archival web graphs
Found in: Proceedings of the 9th workshop on Large-scale and distributed informational retrieval (LSDS-IR '11)
By Torsten Suel, Yasemin Avcular
Issue Date:October 2011
pp. 27-32
In this paper, we study efficient ways to construct, represent and analyze large-scale archival web graphs. We first discuss details of the distributed graph construction algorithm implemented in MapReduce and the design of a space-efficient layered graph ...
     
Text vs. space: efficient geo-search query processing
Found in: Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM '11)
By Alexander Markowetz, Constantinos Dimopoulos, Jinru He, Maria Christoforaki, Torsten Suel
Issue Date:October 2011
pp. 423-432
Many web search services allow users to constrain text queries to a geographic location (e.g., yoga classes near Santa Monica). Important examples include local search engines such as Google Local and location-based search services for smart phones. Severa...
     
Faster top-k document retrieval using block-max indexes
Found in: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information (SIGIR '11)
By Shuai Ding, Torsten Suel
Issue Date:July 2011
pp. 993-1002
Large search engines process thousands of queries per second over billions of documents, making query processing a major performance bottleneck. An important class of optimization techniques called early termination achieves faster query processing by avoi...
     
Faster temporal range queries over versioned text
Found in: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information (SIGIR '11)
By Jinru He, Torsten Suel
Issue Date:July 2011
pp. 565-574
Versioned textual collections are collections that retain multiple versions of a document as it evolves over time. Important large-scale examples are Wikipedia and the web collection of the Internet Archive. Search queries over such collections often use k...
     
Improved index compression techniques for versioned document collections
Found in: Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10)
By Jinru He, Junyuan Zeng, Torsten Suel
Issue Date:October 2010
pp. 1239-1248
Current Information Retrieval systems use inverted index structures for efficient query processing. Due to the extremely large size of many data sets, these index structures are usually kept in compressed form, and many techniques for optimizing compressed...
     
Efficient term proximity search with term-pair indexes
Found in: Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10)
By Fan Zhang, Hao Yan, Ji-Rong Wen, Shuming Shi, Torsten Suel
Issue Date:October 2010
pp. 1229-1238
There has been a large amount of research on early termination techniques in web search and information retrieval. Such techniques return the top-k documents without scanning and evaluating the full inverted lists of the query terms. Thus, they can greatly...
     
Compact full-text indexing of versioned document collections
Found in: Proceeding of the 18th ACM conference on Information and knowledge management (CIKM '09)
By Hao Yan, Jinru He, Torsten Suel
Issue Date:November 2009
pp. 415-424
We study the problem of creating highly compressed full-text index structures for versioned document collections, that is, collections that contain multiple versions of each document. Important examples of such collections are Wikipedia or the web page arc...
     
Compressing term positions in web indexes
Found in: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '09)
By Hao Yan, Shuai Ding, Torsten Suel
Issue Date:July 2009
pp. 435-435
Large search engines process thousands of queries per second on billions of pages, making query processing a major factor in their operating costs. This has led to a lot of research on how to improve query throughput, using techniques such as massive paral...
     
Modeling and predicting user behavior in sponsored search
Found in: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09)
By Josh Attenberg, Sandeep Pandey, Torsten Suel
Issue Date:June 2009
pp. 1-24
Implicit user feedback, including click-through and subsequent browsing behavior, is crucial for evaluating and improving the quality of results returned by search engines. Several recent studies [1, 2, 3, 13, 25] have used post-result browsing behavior in...
     
Improved techniques for result caching in web search engines
Found in: Proceedings of the 18th international conference on World wide web (WWW '09)
By Qingqing Gan, Torsten Suel
Issue Date:April 2009
pp. 66-66
Query processing is a major cost factor in operating large web search engines. In this paper, we study query result caching, one of the main techniques used to optimize query processing performance. Our first contribution is a study of result caching as a ...
     
Using graphics processors for high performance IR query processing
Found in: Proceedings of the 18th international conference on World wide web (WWW '09)
By Hao Yan, Jinru He, Shuai Ding, Torsten Suel
Issue Date:April 2009
pp. 66-66
Web search engines are facing formidable performance challenges due to data sizes and query loads. The major engines have to process tens of thousands of queries per second over tens of billions of documents. To deal with this heavy workload, such engines ...
     
Inverted index compression and query processing with optimized document ordering
Found in: Proceedings of the 18th international conference on World wide web (WWW '09)
By Hao Yan, Shuai Ding, Torsten Suel
Issue Date:April 2009
pp. 66-66
Web search engines use highly optimized compression schemes to decrease inverted index size and improve query throughput, and many index compression techniques have been studied in the literature. One approach taken by several recent studies first performs...
     
Top-k aggregation using intersections of ranked inputs
Found in: Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM '09)
By Kunal Punera, Ravi Kumar, Sergei Vassilvitskii, Torsten Suel
Issue Date:February 2009
pp. 2
There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information...
     
Search engine architectures from conventional to Brian Ulicny and Ken Baclawski. New metrics for newsblog
Found in: Proceeding of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval (LSDS-IR '08)
By Torsten Suel
Issue Date:October 2008
pp. 1001-1001
A significant amount of research has focused on the problem of implementing large-scale search engines in peer-to-peer environments. However, gaps in both performance and perspective remain between conventional approaches and those explored in the peer-to-...
     
Cleaning search results using term distance features
Found in: Proceedings of the 4th international workshop on Adversarial information retrieval on the web (AIRWeb '08)
By Josh Attenberg, Torsten Suel
Issue Date:April 2008
pp. N/A
The presence of Web spam in query results is one of the critical challenges facing search engines today. While search engines try to combat the impact of spam pages on their results, the incentive for spammers to use increasingly sophisticated techniques h...
     
Geographic web usage estimation by monitoring DNS caches
Found in: Proceedings of the first international workshop on Location and the web (LOCWEB '08)
By Huseyin Akcan, Herve Bronnimann, Torsten Suel
Issue Date:April 2008
pp. 85-92
DNS is one of the most actively used distributed databases on earth, accessed by millions of people every day to transparently convert host names into IP addresses and vice versa. In order to improve their performance, DNS servers also keep temporary recor...
     
Analysis of geographic queries in a search engine log
Found in: Proceedings of the first international workshop on Location and the web (LOCWEB '08)
By Alexander Markowetz, Josh Attenberg, Qingqing Gan, Torsten Suel
Issue Date:April 2008
pp. 49-56
Geography is becoming increasingly important in web search. Search engines can often return better results to users by analyzing features such as user location or geographic terms in web pages and user queries. This is also of great commercial value as it ...
     
Using graphics processors for high-performance IR query processing
Found in: Proceeding of the 17th international conference on World Wide Web (WWW '08)
By Hao Yan, Jinru He, Shuai Ding, Torsten Suel
Issue Date:April 2008
pp. 1-7
Web search engines are facing formidable performance challenges as they need to process thousands of queries per second over billions of documents. To deal with this heavy workload, current engines use massively parallel architectures of thousands of machi...
     
Performance of compressed inverted list caching in search engines
Found in: Proceeding of the 17th international conference on World Wide Web (WWW '08)
By Jiangong Zhang, Torsten Suel, Xiaohui Long
Issue Date:April 2008
pp. 1-7
Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making quer...
     
Improving web spam classifiers using link structure
Found in: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web (AIRWeb '07)
By Qingqing Gan, Torsten Suel
Issue Date:May 2007
pp. 17-20
Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time a...
     
Efficient search in large textual collections with redundancy
Found in: Proceedings of the 16th international conference on World Wide Web (WWW '07)
By Jiangong Zhang, Torsten Suel
Issue Date:May 2007
pp. 411-420
Current web search engines focus on searching only themost recentsnapshot of the web. In some cases, however, it would be desirableto search over collections that include many different crawls andversions of each page. One important example of such a colle...
     
Efficient query processing in geographic web search engines
Found in: Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD '06)
By Alexander Markowetz, Torsten Suel, Yen-Yu Chen
Issue Date:June 2006
pp. 277-288
Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest...
     
Efficient query subscription processing for prospective search engines
Found in: Proceedings of the 15th international conference on World Wide Web (WWW '06)
By Rauf Izmailov, Samrat Ganguly, Svilen Mihaylov, Torsten Suel, Utku Irmak
Issue Date:May 2006
pp. 1037-1038
Current web search engines are retrospective in that they limit users to searches against already existing pages. Prospective search engines, on the other hand, allow users to upload queries that will be applied to newly discovered pages in the future. We ...
     
Interactive wrapper generation with minimal user effort
Found in: Proceedings of the 15th international conference on World Wide Web (WWW '06)
By Torsten Suel, Utku Irmak
Issue Date:May 2006
pp. 553-563
While much of the data on the web is unstructured in nature, there is also a significant amount of embedded structured data, such as product information on e-commerce sites or stock data on financial sites. A large amount of research has focused on the pro...
     
Three-level caching for efficient query processing in large Web search engines
Found in: Proceedings of the 14th international conference on World Wide Web (WWW '05)
By Torsten Suel, Xiaohui Long
Issue Date:May 2005
pp. 257-266
Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabyte...
     
Hierarchical substring caching for efficient content distribution to low-bandwidth clients
Found in: Proceedings of the 14th international conference on World Wide Web (WWW '05)
By Torsten Suel, Utku Irmak
Issue Date:May 2005
pp. 43-53
While overall bandwidth in the internet has grown rapidly over the last few years, and an increasing number of clients enjoy broadband connectivity, many others still access the internet over much slower dialup or wireless links. To address this issue, a n...
     
Local methods for estimating pagerank values
Found in: Proceedings of the Thirteenth ACM conference on Information and knowledge management (CIKM '04)
By Qingqing Gan, Torsten Suel, Yen-Yu Chen
Issue Date:November 2004
pp. 381-389
The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page. The PageRank scor...
     
I/O-efficient techniques for computing pagerank
Found in: Proceedings of the eleventh international conference on Information and knowledge management (CIKM '02)
By Qingqing Gan, Torsten Suel, Yen-Yu Chen
Issue Date:November 2002
pp. 549-557
Over the last few years, most major search engines have integrated link-based ranking techniques in order to provide more accurate search results. One widely known approach is the Pagerank technique, which forms the basis of the Google ranking scheme, and ...
     
Compact grid layouts of multi-level networks
Found in: Proceedings of the thirty-first annual ACM symposium on Theory of computing (STOC '99)
By Mike Paterson, Suleyman Cenk Sahinalp, S. Muthukrishnan, Torsten Suel
Issue Date:May 1999
pp. 455-463
We present efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem. Without any constraint v...
     
Highly portable and efficient implementations of parallel adaptive N-body methods
Found in: Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) (Supercomputing '97)
By David Blackston, Torsten Suel
Issue Date:November 1997
pp. 1-20
We describe the design of several portable and efficient parallel implementations of adaptive N-body methods, including the adaptive Fast Multipole Method, the adaptive version of Anderson's Method, and the Barnes-Hut algorithm. Our codes are based on a co...
     
Towards efficiency and portability: programming with the BSP model
Found in: Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures (SPAA '96)
By Kevin Lang, Mark Goudreau, Satish Rao, Thanasis Tsantilas, Torsten Suel
Issue Date:June 1996
pp. 1-12
We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully-connected message-passing sys...
     
On probabilistic networks for selection, merging, and sorting
Found in: Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures (SPAA '95)
By Tom Leighton, Torsten Suel, Yuan Ma
Issue Date:June 1995
pp. 106-118
We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully-connected message-passing sys...
     
Improved bounds for routing and sorting on multi-dimensional meshes
Found in: Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures (SPAA '94)
By Torsten Suel
Issue Date:June 1994
pp. 26-35
We show improved bounds for 1--1 routing and sorting on multi-dimensional meshes and tori. In particular, we give a fairly simple deterministic algorithm for sorting on the d-dimensional mesh of side length n that achieves a running time of 3dn/2+o(n) for ...
     
A lower bound for sorting networks based on the shuffle permutation
Found in: Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures (SPAA '92)
By C. Greg Plaxton, Torsten Suel
Issue Date:June 1992
pp. 70-79
We propose a new design for highly concurrent Internet services, which we call the staged event-driven architecture (SEDA). SEDA is intended to support massive concurrency demands and simplify the construction of well-conditioned services. In SEDA, applica...
     
 1