Search For:

Displaying 1-22 out of 22 total
Scalable approximate query processing with the DBO engine
Found in: ACM Transactions on Database Systems (TODS)
By Abhijit Pol, Alin Dobra, Alin Dobra, Chris Jermaine, Chris Jermaine, Subramanian Arumugam, Subramanian Arumugam
Issue Date:November 2008
pp. 1-54
This article describes query processing in the DBO database system. Like other database systems designed for ad hoc analytic processing, DBO is able to compute the exact answers to queries over a large relational database in a scalable fashion. Unlike any ...
     
Characterizing the Topology of Probabilistic Biological Networks
Found in: IEEE/ACM Transactions on Computational Biology and Bioinformatics
By Andrei Todor,Alin Dobra,Tamer Kahveci
Issue Date:July 2013
pp. 970-983
Biological interactions are often uncertain events, that may or may not take place with some probability. This uncertainty leads to a massive number of alternative interaction topologies for each such network. The existing studies analyze the degree distri...
 
Uncertain interactions affect degree distribution of biological networks
Found in: 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
By Andrei Todor,Alin Dobra,Tamer Kahveci
Issue Date:October 2012
pp. 1-5
Biological interactions are often uncertain events, that may or may not take place under different scenarios. Existing studies analyze the degree distribution of biological networks by assuming that all the given interactions take place under all circumsta...
 
Gossip-Based Computation of Aggregate Information
Found in: Foundations of Computer Science, Annual IEEE Symposium on
By David Kempe, Alin Dobra, Johannes Gehrke
Issue Date:October 2003
pp. 482
<p>Over the last decade, we have seen a revolution in connectivity between computers, and a resulting paradigm shift from centralized to highly distributed systems. With massive scale also comes massive instability, as node and link failures become t...
 
Reachability analysis in probabilistic biological networks
Found in: IEEE/ACM Transactions on Computational Biology and Bioinformatics
By Haitham Gabr,Andrei Todor,Alin Dobra,Tamer Kahveci
Issue Date:August 2014
pp. 1
Extra-cellular molecules trigger a response inside the cell by initiating a signal at special membrane receptors (i.e., sources), which is then transmitted to reporters (i.e., targets) through various chains of interactions among proteins. Understanding wh...
 
Sketching Sampled Data Streams
Found in: Data Engineering, International Conference on
By Florin Rusu, Alin Dobra
Issue Date:April 2009
pp. 381-392
Sampling is used as a universal method to reduce the running time of computations -- the computation is performed on a much smaller sample and then the result is scaled to compensate for the difference in size. Sketches are a popular approximation method f...
 
Characterizing the Topology of Probabilistic Biological Networks
Found in: IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
By Alin Dobra, Andrei Todor, Tamer Kahveci
Issue Date:July 2013
pp. 970-983
Biological interactions are often uncertain events, that may or may not take place with some probability. This uncertainty leads to a massive number of alternative interaction topologies for each such network. The existing studies analyze the degree distri...
     
Probabilistic Biological Network Alignment
Found in: IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
By Alin Dobra, Andrei Todor, Tamer Kahveci
Issue Date:January 2013
pp. 109-121
Interactions between molecules are probabilistic events. An interaction may or may not happen with some probability, depending on a variety of factors such as the size, abundance, or proximity of the interacting molecules. In this paper, we consider the pr...
     
Semi-analytical method for analyzing models and model selection measures based on moment analysis
Found in: ACM Transactions on Knowledge Discovery from Data (TKDD)
By Alin Dobra, Amit Dhurandhar
Issue Date:March 2009
pp. 1-51
In this article we propose a moment-based method for studying models and model selection measures. By focusing on the probabilistic space of classifiers induced by the classification algorithm rather than on that of datasets, we obtain efficient characteri...
     
Confidence bounds for sampling-based group by estimates
Found in: ACM Transactions on Database Systems (TODS)
By Alin Dobra, Christopher Jermaine, Fei Xu
Issue Date:August 2008
pp. 1-44
Sampling is now a very important data management tool, to such an extent that an interface for database sampling is included in the latest SQL standard. In this article we reconsider in depth what at first may seem like a very simple problem—computin...
     
Sketches for size of join estimation
Found in: ACM Transactions on Database Systems (TODS)
By Alin Dobra, Florin Rusu
Issue Date:August 2008
pp. 1-46
Sketching techniques provide approximate answers to aggregate queries both for data-streaming and distributed computation. Small space summaries that have linearity properties are required for both types of applications. The prevalent method for analyzing ...
     
The DBO database system
Found in: Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD '08)
By Alin Dobra, Chris Jermaine, Fei Xu, Florin Rusu, Luis Leopoldo Perez, Mingxi Wu, Ravi Jampani
Issue Date:June 2008
pp. 13-14
We demonstrate our prototype of the DBO database system. DBO is designed to facilitate scalable analytic processing over large data archives. DBO's analytic processing performance is competitive with other database systems; however, unlike any other existi...
     
Aggregation methods for large-scale sensor networks
Found in: ACM Transactions on Sensor Networks (TOSN)
By Alin Dobra, Laukik Chitnis, Sanjay Ranka
Issue Date:March 2008
pp. 1-36
The ability to efficiently aggregate information---for example compute the average temperature---in large networks is crucial for the successful employment of sensor networks. This article addresses the problem of designing truly scalable protocols for com...
     
Scalable approximate query processing with the DBO engine
Found in: Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD '07)
By Abhijit Pol, Alin Dobra, Christopher Jermaine, Subramanian Arumugam
Issue Date:June 2007
pp. 725-736
This paper describes query processing in the DBO database system. Like other database systems designed for ad-hoc, analytic processing, DBO is able to compute the exact answer to queries over a large relational database in a scalable fashion. Unlike any ot...
     
Statistical analysis of sketch estimators
Found in: Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD '07)
By Alin Dobra, Florin Rusu
Issue Date:June 2007
pp. 187-198
Sketching techniques can provide approximate answers to aggregate queries either for data-streaming or distributed computation. Small space summaries that have linearity properties are required for both types of applications. The prevalent method for analy...
     
Pseudo-random number generation for sketch-based estimations
Found in: ACM Transactions on Database Systems (TODS)
By Alin Dobra, Florin Rusu
Issue Date:June 2007
pp. 11-es
The exact computation of aggregate queries, like the size of join of two relations, usually requires large amounts of memory (constrained in data-streaming) or communication (constrained in distributed computation) and large processing times. In this situa...
     
The Sort-Merge-Shrink join
Found in: ACM Transactions on Database Systems (TODS)
By Abhijit Pol, Alin Dobra, Christopher Jermaine, Shantanu Joshi, Subramanian Arumugam
Issue Date:December 2006
pp. 1382-1416
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm called the Sort-Merge-Shrink (SMS) Join for computing the answer to such a query over la...
     
Fast range-summable random variables for efficient aggregate estimation
Found in: Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD '06)
By Alin Dobra, Florin Rusu
Issue Date:June 2006
pp. 193-204
Exact computation for aggregate queries usually requires large amounts of memory - constrained in data-streaming - or communication - constrained in distributed computation - and large processing times. In this situation, approximation techniques with prov...
     
A disk-based join with probabilistic guarantees
Found in: Proceedings of the 2005 ACM SIGMOD international conference on Management of data (SIGMOD '05)
By Abhijit Pol, Alin Dobra, Christopher Jermaine, Shantanu Joshi, Subramanian Arumugam
Issue Date:June 2005
pp. 563-574
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm for computing the answer to such a query over large, disk-based input tables. The key in...
     
Histograms revisited: when are histograms the best approximation method for aggregates over joins?
Found in: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '05)
By Alin Dobra
Issue Date:June 2005
pp. 228-237
The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency -- the so called uniform distribution assumption. In this paper we...
     
SECRET: a scalable linear regression tree algorithm
Found in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02)
By Alin Dobra, Johannes Gehrke
Issue Date:July 2002
pp. 481-487
Developing regression models for large datasets that are both accurate and easy to interpret is a very important data mining problem. Regression trees with linear models in the leaves satisfy both these requirements, but thus far, no truly scalable regress...
     
Processing complex aggregate queries over data streams
Found in: Proceedings of the 2002 ACM SIGMOD international conference on Management of data (SIGMOD '02)
By Alin Dobra, Johannes Gehrke, Minos Garofalakis, Rajeev Rastogi
Issue Date:June 2002
pp. 61-72
Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such...
     
 1