Search For:

Displaying 1-50 out of 96 total
A Fast Algorithm for Subspace Clustering by Pattern Similarity
Found in: Scientific and Statistical Database Management, International Conference on
By Haixun Wang, Fang Chu, Wei Fan, Philip S. Yu, Jian Pei
Issue Date:June 2004
pp. 51
Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends ...
 
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams
Found in: Data Mining, IEEE International Conference on
By Peng Wang, Haixun Wang, Xiaochen Wu, Wei Wang, Baile Shi
Issue Date:November 2005
pp. 474-481
Many applications use classification models on streaming data to detect actionable alerts. Due to concept drifts in the underlying data, how to maintain a model?s up-to-dateness has become one of the most challenging tasks in mining data streams. State of ...
 
Semantic Data Management: Towards Querying Data with their Meaning
Found in: Data Engineering, International Conference on
By Lipyeow Lim, Haixun Wang, Min Wang
Issue Date:April 2007
pp. 1438-1442
Relational database management systems are constantly being extended and augmented to accommodate data in different domains. Recently, with the increasing use of ontology in various applications, the need to support ontology, especially the related inferen...
 
Fast Graph Pattern Matching
Found in: Data Engineering, International Conference on
By Jiefeng Cheng, Jeffrey Xu Yu, Bolin Ding, Philip S. Yu, Haixun Wang
Issue Date:April 2008
pp. 913-922
Due to rapid growth of the Internet technology and new scientific/technological advances, the number of applications that model data as graphs increases, because graphs have high expressive power to model complicated structures. The dominance of graphs in ...
 
Stop Chasing Trends: Discovering High Order Models in Evolving Data
Found in: Data Engineering, International Conference on
By Shixi Chen, Haixun Wang, Shuigeng Zhou, Philip S. Yu
Issue Date:April 2008
pp. 923-932
Many applications are driven by evolving data¿ patterns in web traffic, program execution traces, network event logs, etc., are often non-stationary. Building prediction models for evolving data becomes an important and challenging task. Currently, most ap...
 
A Sampling-Based Approach to Information Recovery
Found in: Data Engineering, International Conference on
By Junyi Xie, Jun Yang, Yuguo Chen, Haixun Wang, Philip S. Yu
Issue Date:April 2008
pp. 476-485
There has been a recent resurgence of interest in research on noisy and incomplete data. Many applications require information to be recovered from such data. Ideally, an approach for information recovery should have the following features. First, it shoul...
 
A Flexible Query Graph Based Model for the Efficient Execution of Continuous Queries
Found in: Data Engineering Workshops, 22nd International Conference on
By Yijian Bai, Hetal Thakkar, Haixun Wang, Carlo Zaniolo
Issue Date:April 2007
pp. 634-643
In this paper, we propose a simple and flexible execution model that (i) supports a wide spectrum of alternative optimization and execution strategies and their mixtures, (ii) provides for dynamic reconfiguration when adding/deleting queries and changing o...
 
Computing Compressed Multidimensional Skyline Cubes Efficiently
Found in: Data Engineering, International Conference on
By Jian Pei, Ada Wai-Chee Fu, Xuemin Lin, Haixun Wang
Issue Date:April 2007
pp. 96-105
Recently, the skyline computation and analysis have been extended from one single full space to multidimensional subspaces, which can lead to valuable insights in some applications. Particularly, compressed skyline cubes in the form of skyline groups and t...
 
Optimizing Timestamp Management in Data Stream Management Systems
Found in: Data Engineering, International Conference on
By Yijian Bai, Hetal Thakkar, Haixun Wang, Carlo Zaniolo
Issue Date:April 2007
pp. 1334-1338
It has long been recognized that multi-stream operators, such as union and join, often have to wait idly in a temporarily blocked state, as a result of skews between the timestamps of their input streams. Recently, it has been shown that the injection of h...
 
Location-based Spatial Queries with Data Sharing in Wireless Broadcast Environments
Found in: Data Engineering, International Conference on
By Wei-Shinn Ku, Roger Zimmermann, Haixun Wang
Issue Date:April 2007
pp. 1355-1359
Location-based spatial queries (LBSQs) refer to spatial queries whose answers rely on the location of the inquirer. Efficient processing of LBSQs is of critical importance with the ever-increasing deployment and use of mobile technologies. We show that LBS...
 
Adaptive Load Diffusion for Multiway Windowed Stream Joins
Found in: Data Engineering, International Conference on
By Xiaohui Gu, Philip S. Yu, Haixun Wang
Issue Date:April 2007
pp. 146-155
In this paper, we present an adaptive load diffusion operator to enable scalable processing of Multiway Windowed Stream Joins (MWSJs) using a cluster system. The load diffusion is achieved by a set of novel semantics-preserving tuple routing algorithms. Di...
 
GString: A Novel Approach for Efficient Search in Graph Databases
Found in: Data Engineering, International Conference on
By Haoliang Jiang, Haixun Wang, Philip S. Yu, Shuigeng Zhou
Issue Date:April 2007
pp. 566-575
Graphs are widely used for modeling complicated data, including chemical compounds, protein interactions, XML documents, and multimedia. Information retrieval against such data can be formulated as a graph search problem, and finding an efficient solution ...
 
Indexing Weighted-Sequences in Large Databases
Found in: Data Engineering, International Conference on
By Haixun Wang, Chang-Shing Perng, Wei Fan, Sanghyun Park, Philip S. Yu
Issue Date:March 2003
pp. 63
We present an index structure for managing weighted-sequences in large databases. A weighted-sequence is defined as a two-dimensional structure where each element in the sequence is associated with a weight. A series of network events, for instance, is a w...
 
An Index Structure for Pattern Similarity Searching in DNA Microarray Data
Found in: Computational Systems Bioinformatics Conference, International IEEE Computer Society
By Haixun Wang, Chang-Shing Perng, Wei Fan, Philip S. Yu
Issue Date:August 2002
pp. 256
The DNA microarray technology is about to bring an explosion of gene expression data that may dwarf even the human sequencing projects. Researchers are motivated to identify genes whose expression levels rise and fall coherently under a set of experimental...
 
Efficiently Mining Frequent Closed Partial Orders
Found in: Data Mining, IEEE International Conference on
By Jian Pei, Jian Liu, Haixun Wang, Ke Wang, Philip S. Yu, Jianyong Wang
Issue Date:November 2005
pp. 753-756
Mining ordering information from sequence data is an important data mining task. Sequential pattern mining [1] can be regarded as mining frequent segments of total orders from sequence data. However, sequential patterns are often insufficient to concisely ...
 
Enhanced Biclustering on Expression Data
Found in: Bioinformatic and Bioengineering, IEEE International Symposium on
By Jiong Yang, Haixun Wang, Wei Wang, Philip Yu
Issue Date:March 2003
pp. 321
Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. ...
 
d-Clusters: Capturing Subspace Correlation in a Large Data Set
Found in: Data Engineering, International Conference on
By Jiong Yang, Wei Wang, Haixun Wang, Philip Yu
Issue Date:March 2002
pp. 0517
Clustering has been an active research area of great practical importance for recent years. Most previous clustering models have focused on grouping objects with similar values on a (sub)set of dimensions (e.g., subspace cluster) and assumed that every obj...
 
On Anomalous Hotspot Discovery in Graph Streams
Found in: 2013 IEEE International Conference on Data Mining (ICDM)
By Weiren Yu,Charu C. Aggarwal,Shuai Ma,Haixun Wang
Issue Date:December 2013
pp. 1271-1276
Network streams have become ubiquitous in recent years because of many dynamic applications. Such streams may show localized regions of activity and evolution because of anomalous events. This paper will present methods for dynamically determining anomalou...
 
Knowledge-Based Approaches to Concept-Level Sentiment Analysis
Found in: IEEE Intelligent Systems
By Erik Cambria,Bjorn Schuller,Bing Liu,Haixun Wang,Catherine Havasi
Issue Date:March 2013
pp. 12-14
The guest editors introduce novel approaches to opinion mining and sentiment analysis that go beyond a mere word-level analysis of text and provide concept-level methods. Such approaches allow a more efficient passage from (unstructured) textual informatio...
 
Concept Clustering of Evolving Data
Found in: Data Engineering, International Conference on
By Shixi Chen, Haixun Wang, Shuigeng Zhou
Issue Date:April 2009
pp. 1327-1330
In Web search, a user refines his search several times before he finds the information he needs. It is very likely that, in the search log, similar sequences of searches appear many times, as many users had searched the Web with the same intent. Precisely ...
 
Online Anomaly Prediction for Robust Cluster Systems
Found in: Data Engineering, International Conference on
By Xiaohui Gu, Haixun Wang
Issue Date:April 2009
pp. 1000-1011
In this paper, we present a stream-based mining algorithm for online anomaly prediction. Many real-world applications such as data stream analysis requires continuous cluster operation. Unfortunately, today's large-scale cluster systems are still vulnerabl...
 
Time-Stamp Management and Query Execution in Data Stream Management Systems
Found in: IEEE Internet Computing
By Yijian Bai, Hetal Thakkar, Haixun Wang, Carlo Zaniolo
Issue Date:November 2008
pp. 13-21
Relational query languages can effectively express continuous queries on data streams after modest extensions. However, implementing such queries efficiently in data stream management systems requires major changes in execution models and optimization tech...
 
Fast Relevance Discovery in Time Series
Found in: Data Mining, IEEE International Conference on
By Chang-shing Perng, Haixun Wang, Sheng Ma
Issue Date:December 2006
pp. 1016-1020
In this paper, we propose to model time series from a new angle: state transition points. When fluctuation of values in a time series crosses a certain point, it may trigger state transition in the system, which may lead to abrupt changes in many other tim...
 
A Balanced Ensemble Approach to Weighting Classifiers for Text Classification
Found in: Data Mining, IEEE International Conference on
By Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Haixun Wang, David W. Cheung, Huan Liu
Issue Date:December 2006
pp. 869-873
This paper studies the problem of constructing an effective heterogeneous ensemble classifier for text classification. One major challenge of this problem is to formulate a good combination function, which combines the decisions of the individual classifie...
 
Dual Labeling: Answering Graph Reachability Queries in Constant Time
Found in: Data Engineering, International Conference on
By Haixun Wang, Hao He2, Jun Yang, Philip S. Yu, Jeffrey Xu Yu
Issue Date:April 2006
pp. 75
Graph reachability is fundamental to a wide range of applications, including XML indexing, geographic navigation, Internet routing, ontology queries based on RDF/OWL, etc. Many applications involve huge graphs and require fast answering of reachability que...
 
Online Mining of Data Streams: Applications, Techniques and Progress
Found in: Data Engineering, International Conference on
By Haixun Wang, Jian Pei, Philip S. Yu
Issue Date:April 2005
pp. 1146
No summary available.
   
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window
Found in: Data Mining, IEEE International Conference on
By Yun Chi, Haixun Wang, Philip S. Yu, Richard R. Muntz
Issue Date:November 2004
pp. 59-66
This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemse...
 
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering
Found in: Data Mining, IEEE International Conference on
By Jian Pei, Xiaoling Zhang, Moonjung Cho, Haixun Wang, Philip S. Yu
Issue Date:November 2003
pp. 259
Pattern-based clustering is important in many applications, such as DNA micro-array data analysis, automatic recommendation systems and target marketing systems. However, pattern-based clustering in large databases is challenging. On the one hand, there ca...
 
Is random model better? On its accuracy and efficiency
Found in: Data Mining, IEEE International Conference on
By Wei Fan, Haixun Wang, Philip S. Yu, Sheng Ma
Issue Date:November 2003
pp. 51
Inductive learning searches an optimal hypothesis that minimizes a given loss function. It is usually assumed that the simplest hypothesis that fits the data is the best approximate to an optimal hypothesis. Since finding the simplest hypothesis is NP-hard...
 
User-directed Exploration of Mining Space with Multiple Attributes
Found in: Data Mining, IEEE International Conference on
By Chang-Shing Perng, Haixun Wang, Sheng Ma, Joseph L. Hellerstein
Issue Date:December 2002
pp. 394
There has been a growing interest in mining frequent itemsets in relational data with multiple attributes. A key step in this approach is to select a set of attributes that group data into transactions and a separate set of attributes that labels data into...
 
Mining Associations by Pattern Structure in Large Relational Tables
Found in: Data Mining, IEEE International Conference on
By Haixun Wang, Chang-Shing Perng, Sheng Ma, Philip S. Yu
Issue Date:December 2002
pp. 482
Association rule mining aims at discovering patterns whose support is beyond a given threshold. Mining patterns composed of items described by an arbitrary subset of attributes in a large relational table represents a new challenge and has various practica...
 
Progressive Modeling
Found in: Data Mining, IEEE International Conference on
By Wei Fan, Haixun Wang, Philip S. Yu, Shaw-hwa Lo, Salvatore Stolfo
Issue Date:December 2002
pp. 163
Presently, inductive learning is still performed in a frustrating batch process. The user has little interaction with the system and no control over the final accuracy and training time. If the accuracy of the produced model is too low, all the computing r...
 
A Fully Distributed Framework for Cost-Sensitive Data Mining
Found in: Distributed Computing Systems, International Conference on
By Wei Fan, Haixun Wang, Philip S. Yu, Salvatore J. Stolfo
Issue Date:July 2002
pp. 445
No summary available.
   
SSDT: A Scalable Subspace-Splitting Classifier for Biased Data
Found in: Data Mining, IEEE International Conference on
By Haixun Wang, Philip S. Yu
Issue Date:December 2001
pp. 542
Decision trees are one of the most extensively used data mining models. Recently, a number of efficient, scalable algorithms for constructing decision trees on large disk-resident dataset have been introduced. In this paper, we study the problem of learnin...
 
FARM: A Framework for Exploring Mining Spaces with Multiple Attributes
Found in: Data Mining, IEEE International Conference on
By Chang-Shing Perng, Haixun Wang, Sheng Ma, Josheph L. Hellerstein
Issue Date:December 2001
pp. 449
Mining for frequent itemsets typically involves a preprocessing step in which data with multiple attributes are grouped into transactions, and item are defined based on attribute values. We have observed that such fixed attribute mining can severely constr...
 
A Low-Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution
Found in: IEEE Transactions on Knowledge and Data Engineering
By Peng Wang, Haixun Wang, Xiaochen Wu, Wei Wang, Baile Shi
Issue Date:September 2007
pp. 1202-1213
Many applications track streaming data for actionable alerts, which may include, for example, network intrusions, transaction frauds, biosurveilence abnormalities, etc. Some stream classification models are built for this purpose. Due to concept drifts, ma...
 
Discovering Frequent Closed Partial Orders from Strings
Found in: IEEE Transactions on Knowledge and Data Engineering
By Jian Pei, Haixun Wang, Jian Liu, Ke Wang, Jianyong Wang, Philip S. Yu
Issue Date:November 2006
pp. 1467-1481
Mining knowledge about ordering from sequence data is an important problem with many applications, such as bioinformatics, Web mining, network management, and intrusion detection. For example, if many customers follow a partial order in their purchases of ...
 
A Unified Framework for Answering k Closest Pairs Queries and Variants
Found in: IEEE Transactions on Knowledge and Data Engineering
By Muhammad Aamir Cheema,Xuemin Lin,Haixun Wang,Jianmin Wang,Wenjie Zhang
Issue Date:November 2014
pp. 2610-2624
Given a scoring function that computes the score of a pair of objects, a top-
 
Efficient Keyword Search on Uncertain Graph Data
Found in: IEEE Transactions on Knowledge and Data Engineering
By Ye Yuan,Guoren Wang,Lei Chen,Haixun Wang
Issue Date:December 2013
pp. 2767-2779
As a popular search mechanism, keyword search has been applied to retrieve useful data in documents, texts, graphs, and even relational databases. However, so far, there is no work on keyword search over uncertain graph data even though the uncertain graph...
 
Attribute extraction and scoring: A probabilistic approach
Found in: 2013 IEEE International Conference on Data Engineering (ICDE 2013)
By Taesung Lee,Zhongyuan Wang,Haixun Wang,Seung-won Hwang
Issue Date:April 2013
pp. 194-205
Knowledge bases, which consist of concepts, entities, attributes and relations, are increasingly important in a wide range of applications. We argue that knowledge about attributes (of concepts or entities) plays a critical role in inferencing. In this pap...
 
A unified approach for computing top-k pairs in multidimensional space
Found in: Data Engineering, International Conference on
By Muhammad Aamir Cheema,Xuemin Lin,Haixun Wang,Jianmin Wang,Wenjie Zhang
Issue Date:April 2011
pp. 1031-1042
Top-k pairs queries have many real applications. k closest pairs queries, k furthest pairs queries and their bichromatic variants are some of the examples of the top-k pairs queries that rank the pairs on distance functions. While these queries have receiv...
 
Inverse Time Dependency in Convex Regularized Learning
Found in: Data Mining, IEEE International Conference on
By Zeyuan Allen Zhu, Weizhu Chen, Chenguang Zhu, Gang Wang, Haixun Wang, Zheng Chen
Issue Date:December 2009
pp. 667-676
In the conventional regularized learning, training time increases as the training set expands. Recent work on L2 linear SVM challenges this common sense by proposing the inverse time dependency on the training set size. In this paper, we first put forward ...
 
A Generic Framework for Top-
Found in: IEEE Transactions on Knowledge and Data Engineering
By Zhitao Shen,Muhammad Aamir Cheema, Xuemin Lin, Wenjie Zhang, Haixun Wang
Issue Date:June 2014
pp. 1349-1366
Top-k pairs and top-k objects queries have received significant attention by the research community. In this paper, we present the first approach to answer a broad class of top-k pairs and top-k objects queries over sliding windows. Our framework handles m...
 
A Bayesian Inference-Based Framework for RFID Data Cleansing
Found in: IEEE Transactions on Knowledge and Data Engineering
By Wei-Shinn Ku,Haiquan Chen,Haixun Wang,Min-Te Sun
Issue Date:October 2013
pp. 2177-2191
The past few years have witnessed the emergence of an increasing number of applications for tracking and tracing based on radio frequency identification (RFID) technologies. However, raw RFID readings are usually of low quality and may contain numerous ano...
 
Statistical Approaches to Concept-Level Sentiment Analysis
Found in: IEEE Intelligent Systems
By Erik Cambria,Bjorn Schuller,Bing Liu,Haixun Wang,Catherine Havasi
Issue Date:May 2013
pp. 6-9
The guest editors introduce novel statistical approaches to concept-level sentiment analysis that go beyond a mere syntactic-driven analysis of text and provide semantic-based methods. Such approaches allow a more efficient passage from (unstructured) text...
   
Automatic extraction of top-k lists from the web
Found in: 2013 IEEE International Conference on Data Engineering (ICDE 2013)
By Zhixian Zhang,Kenny Q. Zhu,Haixun Wang,Hongsong Li
Issue Date:April 2013
pp. 1057-1068
This paper is concerned with information extraction from top-k web pages, which are web pages that describe top k instances of a topic which is of general interest. Examples include “the 10 tallest buildings in the world”, “the 50 hits of 2010 you don't wa...
 
Shallow Information Extraction for the knowledge Web
Found in: 2013 IEEE International Conference on Data Engineering (ICDE 2013)
By Denilson Barbosa,Haixun Wang,Cong Yu
Issue Date:April 2013
pp. 1264-1267
A new breed of Information Extraction tools has become popular and shown to be very effective in building massive-scale knowledge bases that fuel applications such as question answering and semantic search. These approaches rely on Web-scale probabilistic ...
 
LinkProbe: Probabilistic inference on large-scale social networks
Found in: 2013 IEEE International Conference on Data Engineering (ICDE 2013)
By Haiquan Chen,Wei-Shinn Ku,Haixun Wang,Liang Tang,Min-Te Sun
Issue Date:April 2013
pp. 290-301
As one of the most important Semantic Web applications, social network analysis has attracted more and more interest from researchers due to the rapidly increasing availability of massive social network data. A desired solution for social network analysis ...
 
Efficiently Monitoring Top-k Pairs over Sliding Windows
Found in: Data Engineering, International Conference on
By Zhitao Shen,Muhammad Aamir Cheema,Xuemin Lin,Wenjie Zhang,Haixun Wang
Issue Date:April 2012
pp. 798-809
Top-k pairs queries have received significant attention by the research community. k-closest pairs queries, k-furthest pairs queries and their variants are among the most well studied special cases of the top-k pairs queries. In this paper, we present the ...
 
Tracking and Connecting Topics via Incremental Hierarchical Dirichlet Processes
Found in: Data Mining, IEEE International Conference on
By Zekai J. Gao,Yangqiu Song,Shixia Liu,Haixun Wang,Hao Wei,Yang Chen,Weiwei Cui
Issue Date:December 2011
pp. 1056-1061
Much research has been devoted to topic detection from text, but one major challenge has not been addressed: revealing the rich relationships that exist among the detected topics. Finding such relationships is important since many applications are interest...
 
 1  2 Next >>