Search For:

Displaying 1-50 out of 55 total
Making Aggregation Work in Uncertain and Probabilistic Databases
Found in: IEEE Transactions on Knowledge and Data Engineering
By Raghotham Murthy, Robert Ikeda, Jennifer Widom
Issue Date:August 2011
pp. 1261-1273
We describe how aggregation is handled in the Trio system for uncertain and probabilistic data. Because “exact” aggregation in uncertain databases can produce exponentially sized results, we provide three alternatives: a low bound on the aggregate value, a...
 
Logical provenance in data-oriented workflows?
Found in: 2013 IEEE International Conference on Data Engineering (ICDE 2013)
By Robert Ikeda,Akash Das Sarma,Jennifer Widom
Issue Date:April 2013
pp. 877-888
We consider the problem of defining, generating, and tracing provenance in data-oriented workflows, in which input data sets are processed by a graph of transformations to produce output results. We first give a new general definition of provenance for gen...
 
Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows
Found in: Data Engineering, International Conference on
By Robert Ikeda,Junsang Cho,Charlie Fang,Semih Salihoglu,Satoshi Torikai,Jennifer Widom
Issue Date:April 2012
pp. 1249-1252
Panda (for Provenance and Data) is a system that supports the creation and execution of data-oriented workflows, with automatic provenance generation and built-in provenance tracing operations. Workflows in Panda are arbitrary a cyclic graphs containing bo...
 
Confidence-Aware Join Algorithms
Found in: Data Engineering, International Conference on
By Parag Agrawal, Jennifer Widom
Issue Date:April 2009
pp. 628-639
In uncertain and probabilistic databases, confidence values (or probabilities) are associated with each data item. Confidence values are assigned to query results based on combining confidences from the input data. Users may wish to apply a threshold on re...
 
Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases
Found in: Data Engineering, International Conference on
By Anish Das Sarma, Martin Theobald, Jennifer Widom
Issue Date:April 2008
pp. 1023-1032
We study the problem of computing query results with confidence values in ULDBs: relational databases with uncertainty and lineage. ULDBs, which subsume probabilistic databases, offer an alternative decoupled method of computing confidence values: Instead ...
 
A Pipelined Framework for Online Cleaning of Sensor Data Streams
Found in: Data Engineering, International Conference on
By Shawn R. Jeffery, Gustavo Alonso, Michael J. Franklin, Wei Hong, Jennifer Widom
Issue Date:April 2006
pp. 140
Data captured from the physical world through sensor devices tends to be noisy and unreliable. The data cleaning process for such data is not easily handled by standard data warehouse-oriented techniques, which do not take into account the strong temporal ...
 
Indexing Relational Database Content Offline for Efficient Keyword-Based Search
Found in: Database Engineering and Applications Symposium, International
By Qi Su, Jennifer Widom
Issue Date:July 2005
pp. 297-306
Information Retrieval systems such as web search engines offer convenient keyword-based search interfaces. In contrast, relational database systems require the user to learn SQL and to know the schema of the underlying data even to pose simple searches. We...
 
Adaptive Caching for Continuous Queries
Found in: Data Engineering, International Conference on
By Shivnath Babu, Kamesh Munagala, Jennifer Widom, Rajeev Motwani
Issue Date:April 2005
pp. 118-129
We address the problem of executing continuous multiway join queries in unpredictable and volatile environments. Our query class captures windowed join queries in data stream systems as well as conventional maintenance of materialized join views. Our adapt...
 
Incremental Computation and Maintenance of Temporal Aggregates
Found in: Data Engineering, International Conference on
By Jun Yang, Jennifer Widom
Issue Date:April 2001
pp. 0051
Abstract: We consider the problems of computing aggregation queries in temporal databases, and of maintaining materialized temporal aggregate views efficiently. The latter problem is particularly challenging since a single data update can cause aggregate r...
 
Lineage Tracing in a Data Warehousing System
Found in: Data Engineering, International Conference on
By Yingwei Cui, Jennifer Widom
Issue Date:March 2000
pp. 683
A data warehousing system collects data from multiple distributed sources and stores the integrated information as materialized views in a local data warehouse. Users then perform data analysis and mining on the warehouse views. In many cases, the warehous...
   
Practical Lineage Tracing in Data Warehouses
Found in: Data Engineering, International Conference on
By Yingwe Cui, Jennifer Widom
Issue Date:March 2000
pp. 367
We consider the view data lineage problem in a warehousing environment: For a given data item in a materialized warehouse view, we want to identify the set of source data items that produced the view item. We formalize the problem, and we present a lineage...
 
The Starburst Active Database Rule System
Found in: IEEE Transactions on Knowledge and Data Engineering
By Jennifer Widom
Issue Date:August 1996
pp. 583-595
<p><b>Abstract</b>—This paper describes our development of the Starburst Rule System, an active database rules facility integrated into the Starburst extensible relational database system at the IBM Almaden Research Center. The Starburst ...
 
Simplifying Scalable Graph Processing with a Domain-Specific Language
Found in: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '14)
By Jennifer Widom, Kunle Olukotun, Semih Salihoglu, Sungpack Hong
Issue Date:February 2014
pp. 208-218
Large-scale graph processing, with its massive data sets, requires distributed processing. However, conventional frameworks for distributed graph processing, such as Pregel, use non-traditional programming models that are well-suited for parallelism and sc...
     
GPS: a graph processing system
Found in: Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM)
By Jennifer Widom, Semih Salihoglu
Issue Date:July 2013
pp. 1-12
GPS (for Graph Processing System) is a complete open-source system we developed for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. This paper serves the dual role of describing the GPS system, and presentin...
     
Deco: declarative crowdsourcing
Found in: Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM '12)
By Aditya Ganesh Parameswaran, Hector Garcia-Molina, Hyunjung Park, Jennifer Widom, Neoklis Polyzotis
Issue Date:October 2012
pp. 1203-1212
Crowdsourcing enables programmers to incorporate "human computation" as a building block in algorithms that cannot be fully automated, such as text analysis and image recognition. Similarly, humans can be used as a building block in data-intensive applicat...
     
Provenance-based refresh in data-oriented workflows
Found in: Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM '11)
By Jennifer Widom, Robert Ikeda, Semih Salihoglu
Issue Date:October 2011
pp. 1659-1668
We consider a general workflow setting in which input data sets are processed by a graph of transformations to produce output results. Our goal is to perform efficient selective refresh of elements in the output data, i.e., compute the latest values of spe...
     
Optimization of continuous queries with shared expensive filters
Found in: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '07)
By Jennifer Widom, Kamesh Munagala, Utkarsh Srivastava
Issue Date:June 2007
pp. 215-224
We consider the problem of optimizing and executing multiple continuous queries, where each query is a conjunction of filters and each filter may occur in multiple queries. When filters are expensive, significant performance gains are achieved by sharing f...
     
Foreword to special section on SIGMOD/PODS 2005
Found in: ACM Transactions on Database Systems (TODS)
By Foto Afrati, Jennifer Widom
Issue Date:December 2006
pp. 1417-1417
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm called the Sort-Merge-Shrink (SMS) Join for computing the answer to such a query over la...
     
Operator placement for in-network stream query processing
Found in: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '05)
By Jennifer Widom, Kamesh Munagala, Utkarsh Srivastava
Issue Date:June 2005
pp. 250-258
In sensor networks, data acquisition frequently takes place at low-capability devices. The acquired data is then transmitted through a hierarchy of nodes having progressively increasing network band-width and computational power. We consider the problem of...
     
The Lowell database research self-assessment
Found in: Communications of the ACM
By Alon Halevy, Avi Silberschatz, Bruce Croft, David DeWitt, David Maier, Dieter Gawlick, Gerhard Weikum, Hans Schek, Hector Garcia Molina, Jeff Naughton, Jeff Ullman, Jennifer Widom, Jim Gray, Joe Hellerstein, Laura Haas, Martin Kersten, Michael Pazzani, Mike Carey, Mike Franklin, Mike Lesk, Mike Stonebraker, Phil Bernstein, Rakesh Agrawal, Rick Snodgrass, Serge Abiteboul, Stan Zdonik, Stefano Ceri, Timos Sellis, Yannis Ioannidis
Issue Date:May 2005
pp. 111-118
Database needs are changing, driven by the Internet and increasing amounts of scientific and sensor data. In this article, the authors propose research into several important new directions for database management systems.
     
Exploiting k-constraints to reduce memory overhead in continuous queries over data streams
Found in: ACM Transactions on Database Systems (TODS)
By Jennifer Widom, Shivnath Babu, Utkarsh Srivastava
Issue Date:September 2004
pp. 545-580
Continuous queries often require significant run-time state over arbitrary data streams. However, streams may exhibit certain data or arrival patterns, or constraints, that can be detected and exploited to reduce state considerably without compromising cor...
     
Mining the space of graph properties
Found in: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '04)
By Glen Jeh, Jennifer Widom
Issue Date:August 2004
pp. 187-196
Existing data mining algorithms on graphs look for nodes satisfying specific properties, such as specific notions of structural similarity or specific measures of link-based importance. While such analyses for predetermined properties can be effective in w...
     
Flexible time management in data stream systems
Found in: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '04)
By Jennifer Widom, Utkarsh Srivastava
Issue Date:June 2004
pp. 263-274
Continuous queries in a Data Stream Management System (DSMS) rely on time as a basis for windows on streams and for defining a consistent semantics for multiple streams and updatable relations. The system clock in a centralized DSMS provides a convenient a...
     
Rethinking the conference reviewing process
Found in: Proceedings of the 2004 ACM SIGMOD international conference on Management of data (SIGMOD '04)
By Alon Halevy, Anastassia Ailamaki, David DeWitt, Gerhard Weikum, Jennifer Widom, Michael J. Franklin, Philip A. Bernstein, Zachary Ives
Issue Date:June 2004
pp. 957-957
We demonstrate an XML full-text search engine that implements the TeXQuery language. TeXQuery is a powerful full-text search extension to XQuery that provides a rich set of fully composable full-text primitives, such as phrase matching, proximity distance,...
     
StreaMon: an adaptive engine for stream query processing
Found in: Proceedings of the 2004 ACM SIGMOD international conference on Management of data (SIGMOD '04)
By Jennifer Widom, Shivnath Babu
Issue Date:June 2004
pp. 931-932
StreaMon is the adaptive query processing engine of the STREAM prototype Data Stream Management System (DSMS) [4]. A fundamental challenge in many DSMS applications (e.g., network monitoring, financial monitoring over stock tickers, sensor processing) is t...
     
Adaptive ordering of pipelined stream filters
Found in: Proceedings of the 2004 ACM SIGMOD international conference on Management of data (SIGMOD '04)
By Itaru Nishizawa, Jennifer Widom, Kamesh Munagala, Rajeev Motwani, Shivnath Babu
Issue Date:June 2004
pp. 407-418
We consider the problem of pipelined filters, where a continuous stream of tuples is processed by a set of commutative filters. Pipelined filters are common in stream applications and capture a large class of multiway stream joins. We focus on the problem ...
     
Scaling personalized web search
Found in: Proceedings of the twelfth international conference on World Wide Web (WWW '03)
By Glen Jeh, Jennifer Widom
Issue Date:May 2003
pp. 271-279
Recent web search techniques augment traditional text matching with a global notion of "importance" based on the linkage structure of the web, such as in Google's PageRank algorithm. For more refined searches, this global notion of importance can be specia...
     
Exploiting hierarchical domain structure to compute similarity
Found in: ACM Transactions on Information Systems (TOIS)
By Hector Garcia-Molina, Jennifer Widom, Prasanna Ganesan
Issue Date:January 2003
pp. 64-93
The notion of similarity between objects finds use in many contexts, for example, in search engines, collaborative filtering, and clustering. Objects being compared often are modeled as sets, with their similarity traditionally determined based on set inte...
     
SimRank: a measure of structural-context similarity
Found in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02)
By Glen Jeh, Jennifer Widom
Issue Date:July 2002
pp. 538-543
The problem of measuring "similarity" of objects arises in many applications, and many domain-specific measures have been developed, e.g., matching text across documents or computing overlap among item-sets. We propose a complementary approach, applicable ...
     
Data streams: fresh current or stagnant backwater? (panel)
Found in: Proceedings of the 2002 ACM SIGMOD international conference on Management of data (SIGMOD '02)
By Jennifer Widom, Joseph M. Hellerstein
Issue Date:June 2002
pp. 632-632
With the rapid development of the Internet and the World Wide Web (WWW), very large amount of information is available and ready for downloading, most of which are free of charge. At the same time, hard disks with large capacity are available at affordable...
     
Best-effort cache synchronization with source cooperation
Found in: Proceedings of the 2002 ACM SIGMOD international conference on Management of data (SIGMOD '02)
By Chris Olston, Jennifer Widom
Issue Date:June 2002
pp. 73-84
In environments where exact synchronization between source data objects and cached copies is not achievable due to bandwidth or other resource constraints, stale (out-of-date) copies are permitted. It is desirable to minimize the overall divergence between...
     
Characterizing memory requirements for queries over continuous data streams
Found in: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '02)
By Arvind Arasu, Brian Babcock, Jennifer Widom, Jon McAlister, Shivnath Babu
Issue Date:June 2002
pp. 221-232
We consider conjunctive queries with arithmetic comparisons over multiple continuous data streams. We specify an algorithm for determining whether or not a query can be evaluated using a bounded amount of memory for all possible instances of the data strea...
     
Models and issues in data stream systems
Found in: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '02)
By Brian Babcock, Jennifer Widom, Mayur Datar, Rajeev Motwani, Shivnath Babu
Issue Date:June 2002
pp. 1-16
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data stream...
     
Adaptive precision setting for cached approximate values
Found in: Proceedings of the 2001 ACM SIGMOD international conference on Management of data (SIGMOD '01)
By Boon Thau Loo, Chris Olston, Jennifer Widom
Issue Date:May 2001
pp. 256-266
Caching approximate values instead of exact values presents an opportunity for performance gains in exchange for decreased precision. To maximize the performance improvement, cached approximations must be of appropriate precision: approximations that are t...
     
TIP: a temporal extension to Informix
Found in: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD '00)
By Huacheng C. Ying, Jennifer Widom, Jun Yang
Issue Date:May 2000
pp. 245-253
Commercial relational database systems today provide only limited temporal support. To address the needs of applications requiring rich temporal data and queries, we have built TIP (Temporal Information Processor), a temporal extension to the Informix data...
     
Of XML and databases (panel session): where's the beef?
Found in: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD '00)
By Adam Bosworth, Bruce Lindsay, Dan Suciu, Jennifer Widom, Michael J. Carey, Michael Stonebraker
Issue Date:May 2000
pp. 245-253
This panel will examine the implications of the XML revolution, which is currently raging on the web, for database systems research and development.
     
WSQ/DSQ: a practical approach for combined querying of databases and the Web
Found in: Proceedings of the 2000 ACM SIGMOD international conference on Management of data (SIGMOD '00)
By Jennifer Widom, Roy Goldman
Issue Date:May 2000
pp. 245-253
We present WSQ/DSQ (pronounced “wisk-disk”), a new approach for combining the query facilities of traditional databases with existing search engines on the Web. WSQ, for Web-Supported (Database) Queries, leverages results from Web searches to e...
     
Computing the median with uncertainty
Found in: Proceedings of the thirty-second annual ACM symposium on Theory of computing (STOC '00)
By Chris Olston, Jennifer Widom, Rajeev Motwani, Rina Panigrahy, Tomas Feder
Issue Date:May 2000
pp. 602-607
We present efficient new randomized and deterministic methods for transforming optimal solutions for a type of relaxed integer linear program into provably good solutions for the corresponding NP-hard discrete optimization problem. Without any constraint v...
     
On-line warehouse view maintenance
Found in: Proceedings of the 1997 ACM SIGMOD international conference on Management of data (SIGMOD '97)
By Dallan Quass, Jennifer Widom
Issue Date:May 1997
pp. 79
Data warehouses store materialized views over base data from external sources. Clients typically perform complex read-only queries on the views. The views are refreshed periodically by maintenance transactions, which propagate large batch updates from the ...
     
The STRIP rule system for efficiently maintaining derived data
Found in: Proceedings of the 1997 ACM SIGMOD international conference on Management of data (SIGMOD '97)
By Brad Adelberg, Hector Garcia-Molina, Jennifer Widom
Issue Date:May 1997
pp. 79
Derived data is maintained in a database system to correlate and summarize base data which records real world facts. As base data changes, derived data needs to be recomputed. This is often implemented by writing active rules that are triggered by changes ...
     
Efficient and flexible location management techniques for wireless communication systems
Found in: Proceedings of the second annual international conference on Mobile computing and networking (MobiCom '96)
By Derek Lam, Donald C. Cox, Jan Jannink, Jennifer Widom, Narayanan Shnivakumar
Issue Date:November 1996
pp. 38-49
Post-hoc worknotes is a concept demonstration, an envisionment showing how workgroup communication could be supported using a combination of existing technologies in the field of nontextual information management. We have identified a number of use-driven ...
     
LORE: a Lightweight Object REpository for semistructured data
Found in: Proceedings of the 1996 ACM SIGMOD international conference on Management of data (SIGMOD '96)
By Anand Rajaraman, Dallan Quass, Hugo Rivero, Janet Wiener, Jason McHugh, Jeff Ullman, Jennifer Widom, Kevin Haas, Qingshan Luo, Roy Goldman, Serge Abiteboul, Svetlozar Nestorov
Issue Date:June 1996
pp. 219-230
Data mining, or knowledge discovery in databases, has been popularly recognized as an important research issue with broad applications. We provide a comprehensive survey, in database perspective, on the data mining techniques developed recently. Several ma...
     
Change detection in hierarchically structured information
Found in: Proceedings of the 1996 ACM SIGMOD international conference on Management of data (SIGMOD '96)
By Anand Rajaraman, Hector Garcia-Molina, Jennifer Widom, Sudarshan S. Chawathe
Issue Date:June 1996
pp. 219-230
Detecting and representing changes to data is important for active databases, data warehousing, view maintenance, and version and configuration management. Most previous work in change management has dealt with flat-file and relational data; we focus on hi...
     
Research problems in data warehousing
Found in: Proceedings of the fourth international conference on Information and knowledge management (CIKM '95)
By Jennifer Widom
Issue Date:November 1995
pp. 25-30
This paper describes the design of and experimentation with the Knowledge Query and Manipulation Language (KQML), a new language and protocol for exchanging information and knowledge. This work is part of a larger effort, the ARPA Knowledge Sharing Effort ...
     
User profile replication for faster location lookup in mobile environments
Found in: Proceedings of the first annual international conference on Mobile computing and networking (MobiCom '95)
By Jennifer Widom, Narayanan Shivakumar
Issue Date:November 1995
pp. 161-169
Post-hoc worknotes is a concept demonstration, an envisionment showing how workgroup communication could be supported using a combination of existing technologies in the field of nontextual information management. We have identified a number of use-driven ...
     
Information translation, mediation, and mosaic-based browsing in the TSIMMIS system
Found in: Proceedings of the 1995 ACM SIGMOD international conference on Management of data (SIGMOD '95)
By Hector Garcia-Molina, Jeffrey Ullman, Jennifer Widom, Joachim Hammer, Kelly Ireland, Yannis Papakonstantinou
Issue Date:May 1995
pp. 219-230
The VisDB system developed at the University of Munich is a sophisticated tool for visualizing and analyzing large databases. The key idea of the VisDB system is to support the exploration of large databases by using the phenomenal abilities of the human v...
     
View maintenance in a warehousing environment
Found in: Proceedings of the 1995 ACM SIGMOD international conference on Management of data (SIGMOD '95)
By Hector Garcia-Molina, Jennifer Widom, Joachim Hammer, Yue Zhuge
Issue Date:May 1995
pp. 219-230
A warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized views, we must maintain the views as the data sources are updated. This view maintenance problem differs from the tr...
     
Constraint checking with partial information
Found in: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (PODS '94)
By Ashish Gupta, Jeffrey D. Ullman, Jennifer Widom, Yehoshua Sagiv
Issue Date:May 1994
pp. 45-55
Constraints are a valuable tool for managing information across multiple databases, as well as for general purposes of assuring data integrity. However, efficient implementation of constraint checking is difficult. In this paper we explore techniques for a...
     
Local verification of global integrity constraints in distributed databases
Found in: Proceedings of the 1993 ACM SIGMOD international conference on Management of data (SIGMOD '93)
By Ashish Gupta, Jennifer Widom
Issue Date:May 1993
pp. 300-311
We present an optimization for integrity constraint verification in distributed databases. The optimization allows a global constraint, i.e. a constraint spanning multiple databases, to be verified by accessing data at a single database, eliminating the co...
     
Starburst II: the extender strikes back!
Found in: Proceedings of the 1991 ACM SIGMOD international conference on Management of data (SIGMOD '91)
By C. Mohan, George Lapis, Guy M. Lohman, Hamid Pirahesh, Jennifer Widom, John McPherson, Rakesh Agrawal, Roberta Cochrane, Tobin Lehman
Issue Date:May 1991
pp. 328-340
Iris is an object-oriented database management system being developed at Hewlett-Packard Laboratories [1], [3]. This videotape provides an overview of the Iris data model and a summary of our experiences in converting a computer-integrated manufacturing ap...
     
 1  2 Next >>