Search For:

Displaying 1-50 out of 334 total
A Fast Algorithm for Subspace Clustering by Pattern Similarity
Found in: Scientific and Statistical Database Management, International Conference on
By Haixun Wang, Fang Chu, Wei Fan, Philip S. Yu, Jian Pei
Issue Date:June 2004
pp. 51
Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends ...
 
Scale-Up Strategies for Processing High-Rate Data Streams in System S
Found in: Data Engineering, International Conference on
By Henrique Andrade, Bugra Gedik, Kun-Lung Wu, Philip S. Yu
Issue Date:April 2009
pp. 1375-1378
High performance stream processing is critical in sense-and-respond application domains – from environmental monitoring to algorithmic trading. In this paper, we focus on language and runtime support for improving the performance of sense-and-respond appli...
 
An Improved Categorization of Classifier?s Sensitivity on Sample Selection Bias
Found in: Data Mining, IEEE International Conference on
By Wei Fan, Ian Davidson, Bianca Zadrozny, Philip S. Yu
Issue Date:November 2005
pp. 605-608
A recent paper categorizes classifier learning algorithms according to their sensitivity to a common type of sample selection bias where the chance of an example being selected into the training sample depends on its feature vector x but not (directly) on ...
 
Object Distinction: Distinguishing Objects with Identical Names
Found in: Data Engineering, International Conference on
By Xiaoxin Yin, Jiawei Han, Philip S. Yu
Issue Date:April 2007
pp. 1242-1246
Different people or objects may share identical names in the real world, which causes confusion in many applications. It is a nontrivial task to distinguish those objects, especially when there is only very limited information associated with each of them....
 
Transfer Feature Learning with Joint Distribution Adaptation
Found in: 2013 IEEE International Conference on Computer Vision (ICCV)
By Mingsheng Long,Jianmin Wang,Guiguang Ding,Jiaguang Sun,Philip S. Yu
Issue Date:December 2013
pp. 2200-2207
Transfer learning is established as an effective technology in computer vision for leveraging rich labeled data in the source domain to build an accurate classifier for the target domain. However, most prior methods have not simultaneously reduced the diff...
 
Attribute-Based Subsequence Matching and Mining
Found in: Data Engineering, International Conference on
By Yu Peng,Raymond Chi-Wing Wong,Liangliang Ye,Philip S. Yu
Issue Date:April 2012
pp. 989-1000
Sequence analysis is very important in our daily life. Typically, each sequence is associated with an ordered list of elements. For example, in a movie rental application, a customer's movie rental record containing an ordered list of movies is a sequence ...
 
Fast Graph Pattern Matching
Found in: Data Engineering, International Conference on
By Jiefeng Cheng, Jeffrey Xu Yu, Bolin Ding, Philip S. Yu, Haixun Wang
Issue Date:April 2008
pp. 913-922
Due to rapid growth of the Internet technology and new scientific/technological advances, the number of applications that model data as graphs increases, because graphs have high expressive power to model complicated structures. The dominance of graphs in ...
 
Text Classification without Labeled Negative Documents
Found in: Data Engineering, International Conference on
By Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Hongjun Lu, Philip S. Yu
Issue Date:April 2005
pp. 594-605
This paper presents a new solution for the problem of building a text classifier with a small set of labeled positive documents (P) and a large set of unlabeled documents (U). Here, the unlabeled documents are mixed with both of the positive and negative d...
 
Domain-Driven, Actionable Knowledge Discovery
Found in: IEEE Intelligent Systems
By Longbing Cao, Chengqi Zhang, Qiang Yang, David Bell, Michail Vlachos, Bahar Taneri, Eamonn Keogh, Philip S. Yu, Ning Zhong, Mafruz Zaman Ashrafi, David Taniar, Eugene Dubossarsky, Warwick Graco
Issue Date:July 2007
pp. 78-88, c3
Existing knowledge discovery and data mining (KDD) field seldom deliver results that businesses can act on directly. This issue, Trends & Controversies presents seven short articles reporting on different aspects of domain-driven KDD, an R&...
 
Privacy-Preserving SimRank over Distributed Information Network
Found in: 2012 IEEE 12th International Conference on Data Mining (ICDM)
By Yu-Wei Chu,Chih-Hua Tai,Ming-Syan Chen,Philip S. Yu
Issue Date:December 2012
pp. 840-845
Information network analysis has drawn a lot attention in recent years. Among all the aspects of network analysis, similarity measure of nodes has been shown useful in many applications, such as clustering, link prediction and community identification, to ...
 
EIC Editorial
Found in: IEEE Transactions on Knowledge and Data Engineering
By Philip S. Yu
Issue Date:March 2004
pp. 289-291
No summary available.
 
An Index Structure for Pattern Similarity Searching in DNA Microarray Data
Found in: Computational Systems Bioinformatics Conference, International IEEE Computer Society
By Haixun Wang, Chang-Shing Perng, Wei Fan, Philip S. Yu
Issue Date:August 2002
pp. 256
The DNA microarray technology is about to bring an explosion of gene expression data that may dwarf even the human sequencing projects. Researchers are motivated to identify genes whose expression levels rise and fall coherently under a set of experimental...
 
Positive and Unlabeled Learning for Graph Classification
Found in: Data Mining, IEEE International Conference on
By Yuchen Zhao,Xiangnan Kong,Philip S. Yu
Issue Date:December 2011
pp. 962-971
The problem of graph classification has drawn much attention in the last decade. Conventional approaches on graph classification focus on mining discriminative sub graph features under supervised settings. The feature selection strategies strictly follow t...
 
On the Hardness of Graph Anonymization
Found in: Data Mining, IEEE International Conference on
By Charu C. Aggarwal,Yao Li,Philip S. Yu
Issue Date:December 2011
pp. 1002-1007
In this paper, we examine the problem of node re-identification from anonymized graphs. Typical graphs encountered in real applications are massive and sparse. In this paper, we will show that massive and sparse graphs have certain theoretical properties w...
 
On data dependencies in dataspaces
Found in: Data Engineering, International Conference on
By Shaoxu Song,Lei Chen,Philip S. Yu
Issue Date:April 2011
pp. 470-481
To study data dependencies over heterogeneous data in dataspaces, we define a general dependency form, namely comparable dependencies (CDs), which specifies constraints on comparable attributes. It covers the semantics of a broad class of dependencies in d...
 
Outlier detection in graph streams
Found in: Data Engineering, International Conference on
By Charu C. Aggarwal,Yuchen Zhao,Philip S. Yu
Issue Date:April 2011
pp. 399-409
A number of applications in social networks, telecommunications, and mobile computing create massive streams of graphs. In many such applications, it is useful to detect structural abnormalities which are different from the
 
Vote-Based LELC for Positive and Unlabeled Textual Data Streams
Found in: Data Mining Workshops, International Conference on
By Bo Liu, Yanshan Xiao, Longbing Cao, Philip S. Yu
Issue Date:December 2010
pp. 951-958
In this paper, we extend LELC (PU Learning by Extracting Likely Positive and Negative Micro-Clusters) method to cope with positive and unlabeled data streams. Our developed approach, which is called vote-based LELC, works in three steps. In the first step,...
 
Transfer Learning on Heterogenous Feature Spaces via Spectral Transformation
Found in: Data Mining, IEEE International Conference on
By Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, Ruixin Zhu
Issue Date:December 2010
pp. 1049-1054
Labeled examples are often expensive and time-consuming to obtain. One practically important problem is: can the labeled data from other related sources help predict the target task, even if they have (a) different feature spaces (e.g., image vs. text data...
 
Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environments
Found in: IEEE Transactions on Knowledge and Data Engineering
By Eric Hsueh-Chan Lu, Vincent S. Tseng, Philip S. Yu
Issue Date:June 2011
pp. 914-927
Researches on Location-Based Service (LBS) have been emerging in recent years due to a wide range of potential applications. One of the active topics is the mining and prediction of mobile movements and associated transactions. Most of existing studies foc...
 
Music Recommendation Using Content and Context Information Mining
Found in: IEEE Intelligent Systems
By Ja-Hwung Su, Hsin-Ho Yeh, Philip S. Yu, Vincent S. Tseng
Issue Date:January 2010
pp. 16-26
<p>To offer music recommendations that suit the listener and the situation, uMender mines context information and musical content and then considers relevant user ratings.</p>
 
Efficient Construction of Compact Shedding Filters for Data Stream Processing
Found in: Data Engineering, International Conference on
By Bugra Gedik, Kun-Lung Wu, Philip S. Yu
Issue Date:April 2008
pp. 396-405
High-volume source streams, coupled with fluctuating rates, necessitate adaptive load shedding in data stream processing. When ignored, a continual query (CQ) server may randomly drop items, when its capacity is inadequate to handle the arriving data, and ...
 
Efficient Discovery of Frequent Approximate Sequential Patterns
Found in: Data Mining, IEEE International Conference on
By Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu
Issue Date:October 2007
pp. 751-756
We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a
 
A Load Shedding Framework and Optimizations for M-way Windowed Stream Joins
Found in: Data Engineering, International Conference on
By Bugra Gedik, Kun-Lung Wu, Philip S. Yu, Ling Liu
Issue Date:April 2007
pp. 536-545
Tuple dropping, though commonly used for load shedding in most stream operations, is inadequate for m-way, windowed stream joins. The join output rate can be overly reduced because it fails to exploit the time correlations likely to exist among interrelate...
 
Discovering Frequent Closed Partial Orders from Strings
Found in: IEEE Transactions on Knowledge and Data Engineering
By Jian Pei, Haixun Wang, Jian Liu, Ke Wang, Jianyong Wang, Philip S. Yu
Issue Date:November 2006
pp. 1467-1481
Mining knowledge about ordering from sequence data is an important problem with many applications, such as bioinformatics, Web mining, network management, and intrusion detection. For example, if many customers follow a partial order in their purchases of ...
 
Focused Community Discovery
Found in: Data Mining, IEEE International Conference on
By Kirsten Hildrum, Philip S. Yu
Issue Date:November 2005
pp. 641-644
We present a new approach to community discovery. Community discovery usually partitions the graph into communities or clusters. Focused community discovery allows the searcher to specify start points of interest, and find the community of those points. Fo...
 
Efficiently Mining Frequent Closed Partial Orders
Found in: Data Mining, IEEE International Conference on
By Jian Pei, Jian Liu, Haixun Wang, Ke Wang, Philip S. Yu, Jianyong Wang
Issue Date:November 2005
pp. 753-756
Mining ordering information from sequence data is an important data mining task. Sequential pattern mining [1] can be regarded as mining frequent segments of total orders from sequence data. However, sequential patterns are often insufficient to concisely ...
 
Template-Based Privacy Preservation in Classification Problems
Found in: Data Mining, IEEE International Conference on
By Ke Wang, Benjamin C. M. Fung, Philip S. Yu
Issue Date:November 2005
pp. 466-473
In this paper, we present a template-based privacy preservation to protect against the threats caused by data mining abilities. The problem has dual goals: preserve the information for a wanted classification analysis and limit the usefulness of unwanted s...
 
Combining Multiple Clusterings by Soft Correspondence
Found in: Data Mining, IEEE International Conference on
By Bo Long, Zhongfei (Mark) Zhang, Philip S. Yu
Issue Date:November 2005
pp. 282-289
Combining multiple clusterings arises in various important data mining scenarios. However, finding a consensus clustering from multiple clusterings is a challenging task because there is no explicit correspondence between the classes from different cluster...
 
Processing Continual Range Queries over Moving Objects Using VCR-Based Query Indexes
Found in: Mobile and Ubiquitous Systems, Annual International Conference on
By Kun-Lung Wu, Shyh-Kwei Chen, Philip S. Yu
Issue Date:August 2004
pp. 226-235
This paper describes VCR-based query indexes for efficient processing of continual range queries over moving objects. A set of virtual construct rectangles (VCR) is pre-defined, each with a unique ID. One or more VCRs is used to strictly cover the entire r...
 
Shingle-Based Query Indexing for Location-Based Mobile E-Commerce
Found in: E-Commerce Technology, IEEE International Conference on
By Kun-Lung Wu, Shyh-Kwei Chen, Philip S. Yu
Issue Date:July 2004
pp. 16-23
We present a shingle-based query index (SQI) for supporting location-based services in mobile e-commerce. SQI is used to efficiently identify moving objects that are currently located inside a geographical region. A set of virtual shingles is predefined, e...
 
Indexing Continual Range Queries with Covering Tiles for Fast Locating of Moving Objects
Found in: Distributed Computing Systems Workshops, International Conference on
By Kun-Lung Wu, Shyh-Kwei Chen, Philip S. Yu
Issue Date:March 2004
pp. 470-475
We present a COVEering Tile-based (COVET) query index for fast locating of moving objects. A set of virtual tiles are predefined, each with a unique ID. One or more of the virtual tiles are used to strictly cover individual range queries. A COVET index mai...
 
Indexing Continual Range Queries for Location-Aware Mobile Services
Found in: e-Technology, e-Commerce, and e-Services, IEEE International Conference on
By Kun-Lung Wu, Shyh-Kwei Chen, Philip S. Yu
Issue Date:March 2004
pp. 233-240
We study a new main memory-based approach to indexing continual range queries to support location-aware mobile services. The query index is used to efficiently answer the following question repeatedly:
 
Accelerating Approximate Subsequence Search on Large Protein Sequence Databases
Found in: Computational Systems Bioinformatics Conference, International IEEE Computer Society
By Jiong Yang, Wei Wang, Yi Xia, Philip S. Yu
Issue Date:August 2002
pp. 207
Bioinformatics has become an active research area in recent years. The amount of mapped sequences doubles every fourteen months. BLAST has been widely employed for retrieving sequences which has similar portion(s) to a given sequence. However, BLAST has to...
 
A New Approach to Online Generation of Association Rules
Found in: IEEE Transactions on Knowledge and Data Engineering
By Charu C. Aggarwal, Philip S. Yu
Issue Date:July 2001
pp. 527-540
<p><b>Abstract</b>—We discuss the problem of online mining of association rules in a large database of sales transactions. The online mining is performed by preprocessing the data effectively in order to make it suitable for repeated onli...
 
Analytic Modeling of Clustered RAID with Mapping Based on Nearly Random Permutation
Found in: IEEE Transactions on Computers
By Arif Merchant, Philip S. Yu
Issue Date:March 1996
pp. 367-373
<p><b>Abstract</b>—A Redundant Array of Independent Disks (RAID) of <it>G</it> disks provides protection against single disk failures by adding one parity block for each <it>G</it>− 1 data blocks. In a <it>cl...
 
Analytic Modeling and Comparisons of Striping Strategies for Replicated Disk Arrays
Found in: IEEE Transactions on Computers
By Philip S. Yu, Arif Merchant
Issue Date:March 1995
pp. 419-433
<p><it>Abstract</it>—Data replication has been widely used as a means of increasing the data availability for critical applications in the event of disk failure. There are different ways of organizing the two copies of the data across a d...
 
Multi-Space-Mapped SVMs for Multi-class Classification
Found in: Data Mining, IEEE International Conference on
By Bo Liu, Longbing Cao, Philip S. Yu, Chengqi Zhang
Issue Date:December 2008
pp. 911-916
In SVMs-based multiple classification, it is not always possible to find an appropriate kernel function to map all the classes from different distribution functions into a feature space where they are linearly separable from each other. This is even worse ...
 
Adaptive Load Diffusion for Multiway Windowed Stream Joins
Found in: Data Engineering, International Conference on
By Xiaohui Gu, Philip S. Yu, Haixun Wang
Issue Date:April 2007
pp. 146-155
In this paper, we present an adaptive load diffusion operator to enable scalable processing of Multiway Windowed Stream Joins (MWSJs) using a cluster system. The load diffusion is achieved by a set of novel semantics-preserving tuple routing algorithms. Di...
 
Adding the Temporal Dimension to Search — A Case Study in Publication Search
Found in: Web Intelligence, IEEE / WIC / ACM International Conference on
By Philip S. Yu, Xin Li, Bing Liu
Issue Date:September 2005
pp. 543-549
The most well known search techniques are perhaps the PageRank and HITS algorithms. In this paper we argue that these algorithms miss an important dimension, the temporal dimension. Quality pages in the past may not be quality pages now or in the future. T...
 
On Incremental Processing of Continual Range Queries for Location-Aware Services and Applications
Found in: Mobile and Ubiquitous Systems, Annual International Conference on
By Kun-Lung Wu, Shyh-Kwei Chen, Philip S. Yu
Issue Date:July 2005
pp. 261-269
<p>A set of continual range queries, each defining the geographical region of interest, can be periodically reevaluated to locate moving objects. Processing these continual queries efficiently and incrementally hence becomes important for location-aw...
 
Multilabel Consensus Classification
Found in: 2013 IEEE International Conference on Data Mining (ICDM)
By Sihong Xie,Xiangnan Kong,Jing Gao,Wei Fan,Philip S. Yu
Issue Date:December 2013
pp. 1241-1246
In the era of big data, a large amount of noisy and incomplete data can be collected from multiple sources for prediction tasks. Combining multiple models or data sources helps to counteract the effects of low data quality and the bias of any single model ...
 
Introduction to the Domain-Driven Data Mining Special Section
Found in: IEEE Transactions on Knowledge and Data Engineering
By Chenqi Zhang, Philip S. Yu, David Bell
Issue Date:June 2010
pp. 753-754
No summary available.
 
EIC Editorial
Found in: IEEE Transactions on Knowledge and Data Engineering
By Philip S. Yu
Issue Date:January 2005
pp. 1-2
No summary available.
 
EIC Editorial
Found in: IEEE Transactions on Knowledge and Data Engineering
By Philip S. Yu
Issue Date:September 2004
pp. 1025
No summary available.
 
Editorial: State of the Transactions
Found in: IEEE Transactions on Knowledge and Data Engineering
By Philip S. Yu
Issue Date:January 2004
pp. 1
No summary available.
 
Editorial: AE Introduction
Found in: IEEE Transactions on Knowledge and Data Engineering
By Philip S. Yu
Issue Date:September 2003
pp. 1057-1058
No summary available.
 
Editorial: New AE Introduction
Found in: IEEE Transactions on Knowledge and Data Engineering
By Philip S. Yu
Issue Date:January 2003
pp. 1
No summary available.
 
Editorial: Introducing the New AEs
Found in: IEEE Transactions on Knowledge and Data Engineering
By Philip S. Yu
Issue Date:September 2002
pp. 929
No summary available.
 
Editorial: Introducing the New AEs
Found in: IEEE Transactions on Knowledge and Data Engineering
By Philip S. Yu
Issue Date:May 2001
pp. 393-394
No summary available.
 
A Comparison of Objective Functions in Network Community Detection
Found in: Data Mining Workshops, International Conference on
By Chuan Shi, Yanan Cai, Philip S. Yu, Zhenyu Yan, Bin Wu
Issue Date:December 2010
pp. 1234-1241
Community detection, as an important unsupervised learning problem in social network analysis, has attracted great interests in various research areas. Many objective functions for community detection that can capture the intuition of communities have been...
 
 1  2 Next >>