Search For:

Displaying 1-35 out of 35 total
Statistical Learning for File-Type Identification
Found in: Machine Learning and Applications, Fourth International Conference on
By Siddharth Gopal,Yiming Yang,Konstantin Salomatin,Jaime Carbonell
Issue Date:December 2011
pp. 68-73
File-type Identification (FTI) is an important problem in digital forensics, intrusion detection, and other related fields. Using state-of-the-art classification techniques to solve FTI problems has begun to receive research attention, however, general con...
 
Personalized Email Prioritization Based on Content and Social Network Analysis
Found in: IEEE Intelligent Systems
By Yiming Yang, Shinjae Yoo, Frank Lin, Il-Chul Moon
Issue Date:July 2010
pp. 12-18
<p>The proposed system combines unsupervised clustering, social network analysis, semisupervised feature induction, and supervised classification to model user priorities among incoming email messages.</p>
 
Guest Editors' Introduction: Intelligent Information Retrieval
Found in: IEEE Intelligent Systems
By Yiming Yang, Jan Pedersen
Issue Date:July 1999
pp. 30-31
No summary available.
 
Cost-Sensitive-Data Preprocessing for Mining Customer Relationship Management Databases
Found in: IEEE Intelligent Systems
By Junfeng Pan, Qiang Yang, Yiming Yang, Lei Li, Frances Tianyi Li, George Wenmin Li
Issue Date:January 2007
pp. 46-51
Telecommunications companies and financial institutions are facing increasing competition. A staged preprocessing framework for cost-sensitive-data processing can help these companies identify customers who might switch to a competitor (or churn). The fram...
 
A Secure File Allocation Algorithm for Heterogeneous Distributed Systems
Found in: Parallel Processing Workshops, International Conference on
By Yun Tian,Jiong Xie,Shu Yin,Ji Zhang,Xiao Qin,Mohammed I. Alghamdi,Meikang Qiu,Yiming Yang
Issue Date:September 2011
pp. 168-175
In this study we develop a secure allocating processing(SAP) algorithm for the S-FAS scheme [13] to improve the security level and consider its performance using the heterogeneous feature of a large distributed system. The SAP allocation algorithm consider...
 
Secure Fragment Allocation in a Distributed Storage System with Heterogeneous Vulnerabilities
Found in: Networking, Architecture, and Storage, International Conference on
By Yun Tian,Shu Yin,Jiong Xie,Ji Zhang,Xiao Qin,Mohammed I. Alghamdi,Meikang Qiu,Yiming Yang
Issue Date:July 2011
pp. 170-179
There is a growing demand for large-scale distributed storage systems to support resource sharing and fault tolerance. Although heterogeneity issues of distributed systems have been widely investigated, little attention has yet been paid to security soluti...
 
Applying Q-Learning Algorithm to Study Line-Grasping Control Policy for Transmission Line Deicing Robot
Found in: 2010 International Conference on Intelligent System Design and Engineering Application (ISDEA 2010)
By Shuning Wei, Yaonan Wang, Yiming Yang, Feng Yin, Wenming Cao, Yong Tang
Issue Date:October 2010
pp. 382-387
Ice coating in power networks could result in power-tower collapse and power interruption. This paper introduces a preliminary design of deicing robot, which travels on transmission lines and automatically remove ices. Inevitably, the deicing robot will en...
 
Improving Energy Efficiency and Security for Disk Systems
Found in: High Performance Computing and Communications, 10th IEEE International Conference on
By Shu Yin, Mohammed I. Alghamdi, Xiaojun Ruan, Mais Nijim, Ashwin Tamilarasan, Ziliang Zong, Xiao Qin, Yiming Yang
Issue Date:September 2010
pp. 442-449
Improving security and minimizing power consumption are crucial for large-scale data storage systems. Although a handful of studies have been focused on data security and energy efficiency, most of the existing approaches have concentrated on only one of t...
 
Domain Feature Model Recovery from Multiple Applications Using Data Access Semantics and Formal Concept Analysis
Found in: Reverse Engineering, Working Conference on
By Yiming Yang, Xin Peng, Wenyun Zhao
Issue Date:October 2009
pp. 215-224
Feature models are widely employed in domain specific software development to specify the domain requirements with commonality and variability. A feature model is usually constructed by domain experts after comprehensive domain analysis. In this paper, we ...
 
An Automatic Connector Generation Method for Dynamic Architecture
Found in: Computer Software and Applications Conference, Annual International
By Yiming Yang, Xin Peng, Wenyun Zhao
Issue Date:July 2007
pp. 409-414
In a component-based system components are basic computation units implementing specific business functions, and their interactions are explicitly represented by connectors. If the system is required to be adaptable with dynamic architectural evolutions, t...
 
Learning Approaches for Detecting and Tracking News Events
Found in: IEEE Intelligent Systems
By Yiming Yang, Jaime G. Carbonell, Ralf D. Brown, Thomas Pierce, Brian T. Archibald, Xin Liu
Issue Date:July 1999
pp. 32-43
<p>This article studies the effective use of information-retrieval and machine-learning techniques in a new task, event detection and tracking. The objective is to automatically detect novel events from chronologically ordered streams of news stories...
 
A unified optimization framework for auction and guaranteed delivery in online advertising
Found in: Proceedings of the 21st ACM international conference on Information and knowledge management (CIKM '12)
By Konstantin Salomatin, Tie-Yan Liu, Yiming Yang
Issue Date:October 2012
pp. 2005-2009
This paper proposes a new unified optimization framework combining pay-per-click auctions and guaranteed delivery in sponsored search. Advertisers usually have different (and sometimes mixed) marketing goals: brand awareness and direct response. Different ...
     
Modeling personalized email prioritization: classification-based and regression-based approaches
Found in: Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM '11)
By Jaime Carbonell, Shinjae Yoo, Yiming Yang
Issue Date:October 2011
pp. 729-738
Email overload, even after spam filtering, presents a serious productivity challenge for busy professionals and executives. One solution is automated prioritization of incoming emails to ensure the most important are read and processed quickly, while other...
     
CiteData: a new multi-faceted dataset for evaluating personalized search performance
Found in: Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10)
By Abhay Harpale, Daqing He, Siddharth Gopal, Yiming Yang, Zhen Yue
Issue Date:October 2010
pp. 549-558
Personalized search systems have evolved to utilize heterogeneous features including document hyperlinks, category labels in various taxonomies and social tags in addition to free-text of the documents. Consequently, classifiers, PageRank algorithms and Co...
     
Learning to rank relevant and novel documents through user feedback
Found in: Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10)
By Abhimanyu Lad, Yiming Yang
Issue Date:October 2010
pp. 469-478
We consider the problem of learning to rank relevant and novel documents so as to directly maximize a performance metric called Expected Global Utility (EGU), which has several desirable properties: (i) It measures retrieval performance in terms of relevan...
     
Multilabel classification with meta-level features
Found in: Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10)
By Siddharth Gopal, Yiming Yang
Issue Date:July 2010
pp. 315-322
Effective learning in multi-label classification (MLC) requires an appropriate level of abstraction for representing the relationship between each instance and multiple categories. Current MLC methods have been focused on learning-to-map from instances to ...
     
Protein identification as an information retrieval problem
Found in: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '09)
By Abhay Harpale, Subramaniam Ganapathy, Yiming Yang
Issue Date:July 2009
pp. 435-435
We present the first interdisciplinary work on transforming a popular problem in proteomics, i.e. protein identification from tandem mass spectra, to an Information Retrieval (IR) problem. We present an empirical comparison of popular IR approaches, such a...
     
Mining social networks for personalized email prioritization
Found in: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09)
By Frank Lin, Il-Chul Moon, Shinjae Yoo, Yiming Yang
Issue Date:June 2009
pp. 1-24
Email is one of the most prevalent communication tools today, and solving the email overload problem is pressingly urgent. A good way to alleviate email overload is to automatically prioritize received messages according to the priorities of each user. How...
     
Corpus microsurgery: criteria optimization for medical cross-language ir
Found in: Proceeding of the 17th ACM conference on Information and knowledge mining (CIKM '08)
By Jaime Carbonell, Monica Rogati, Yiming Yang
Issue Date:October 2008
pp. 1001-1001
Automatic subset selection from a parallel corpus significantly cross-lingual information retrieval (CLIR) performance, in addition to increasing its efficiency. Our selection method extracts relevant training data by incorporating additional criteria (i.e...
     
Personalized active learning for collaborative filtering
Found in: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '08)
By Abhay S. Harpale, Yiming Yang
Issue Date:July 2008
pp. 2-2
Collaborative Filtering (CF) requires user-rated training examples for statistical inference about the preferences of new users. Active learning strategies identify the most informative set of training examples through minimum interactions with the users. ...
     
Utility-based information distillation over temporally sequenced documents
Found in: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '07)
By Abhay Harpale, Abhimanyu Lad, Bryan Kisiel, Monica Rogati, Ni Lao, Yiming Yang
Issue Date:July 2007
pp. 31-38
This paper examines a new approach to information distillation over temporally ordered documents, and proposes a novel evaluation scheme for such a framework. It combines the strengths of and extends beyond conventional adaptive filtering, novelty detectio...
     
Using recursive classification to discover predictive features
Found in: Proceedings of the 2005 ACM symposium on Applied computing (SAC '05)
By Fan Li, Yiming Yang
Issue Date:March 2005
pp. 1054-1058
Finding most predictive features for statistical classification is a challenging problem and has important applications. Support Vector Machines (SVMs), for example, have been found successful with a recursive procedure in selecting most important genes fo...
     
Probabilistic score estimation with piecewise logistic regression
Found in: Twenty-first international conference on Machine learning (ICML '04)
By Jian Zhang, Yiming Yang
Issue Date:July 2004
pp. 182-182
Well-calibrated probabilities are necessary in many applications like probabilistic frameworks or cost-sensitive tasks. Based on previous success of asymmetric Laplace method in calibrating text classifiers' scores, we propose to use piecewise logistic reg...
     
Resource selection for domain-specific cross-lingual IR
Found in: Proceedings of the 27th annual international conference on Research and development in information retrieval (SIGIR '04)
By Monica Rogati, Yiming Yang
Issue Date:July 2004
pp. 154-161
An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several...
     
High-performing feature selection for text classification
Found in: Proceedings of the eleventh international conference on Information and knowledge management (CIKM '02)
By Monica Rogati, Yiming Yang
Issue Date:November 2002
pp. 659-661
This paper reports a controlled study on a large number of filter feature selection methods for text classification. Over 100 variants of five major feature selection criteria were examined using four well-known classification algorithms: a Naive Bayesian ...
     
Boosting to correct inductive bias in text classification
Found in: Proceedings of the eleventh international conference on Information and knowledge management (CIKM '02)
By Jaime Carbonell, Yan Liu, Yiming Yang
Issue Date:November 2002
pp. 348-355
This paper studies the effects of boosting in the context of different classification methods for text categorization, including Decision Trees, Naive Bayes, Support Vector Machines (SVMs) and a Rocchio-style classifier. We identify the inductive biases of...
     
Topic-conditioned novelty detection
Found in: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '02)
By Chun Jin, Jaime Carbonell, Jian Zhang, Yiming Yang
Issue Date:July 2002
pp. 688-693
Automated detection of the first document reporting each new event in temporally-sequenced streams of documents is an open challenge. In this paper we propose a new approach which addresses this problem in two stages: 1) using a supervised learning algorit...
     
A study of thresholding strategies for text categorization
Found in: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '01)
By Yiming Yang
Issue Date:September 2001
pp. 137-145
Thresholding strategies in automated text categorization are an underexplored area of research. This paper presents an examination of the effect of thresholding strategies on the performance of a classifier under various conditions. Using k-Nearest Neigh...
     
Improving text categorization methods for event tracking
Found in: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '00)
By Charles W. Lattimer, Thomas Pierce, Tom Ault, Yiming Yang
Issue Date:July 2000
pp. 65-72
Automated tracking of events from chronologically ordered document streams is a new challenge for statistical text classification. Existing learning techniques must be adapted or improved in order to effectively handle difficult situations where the number...
     
A re-examination of text categorization methods
Found in: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '99)
By Xin Liu, Yiming Yang
Issue Date:August 1999
pp. 42-49
The TIPSTER collection is unusual because of both its size and detail. In particular, it describes a set of information needs, as opposed to traditional queries. These detailed representations of information need are an opportunity for research on differen...
     
A study of retrospective and on-line event detection
Found in: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '98)
By Jaime Carbonell, Tom Pierce, Yiming Yang
Issue Date:August 1998
pp. 28-36
The TIPSTER collection is unusual because of both its size and detail. In particular, it describes a set of information needs, as opposed to traditional queries. These detailed representations of information need are an opportunity for research on differen...
     
Noise reduction in a statistical approach to text categorization
Found in: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '95)
By Yiming Yang
Issue Date:July 1995
pp. 256-263
The TIPSTER collection is unusual because of both its size and detail. In particular, it describes a set of information needs, as opposed to traditional queries. These detailed representations of information need are an opportunity for research on differen...
     
An application of least squares fit mapping to text information retrieval
Found in: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '93)
By Christopher G. Chute, Yiming Yang
Issue Date:June 1993
pp. 281-290
This paper describes a unique example-based mapping method for document retrieval. We discovered that the knowledge about relevance among queries and documents can be used to obtain empirical connections between query terms and the canonical concepts which...
     
An example-based mapping method for text categorization and retrieval
Found in: ACM Transactions on Information Systems (TOIS)
By Christopher G. Chute, Yiming Yang
Issue Date:January 1992
pp. 252-277
A unified model for text categorization and text retrieval is introduced. We use a training set of manually categorized documents to learn word-category associations, and use these associations to predict the categories of arbitrary documents. Similarly, w...
     
An experimental study on large-scale web categorization
Found in: Special interest tracks and posters of the 14th international conference on World Wide Web (WWW '05)
By Bin GAO, Hao WAN, Hua-Jun ZENG, Qian ZHOU, Tie-Yan LIU, Wei-Ying MA, Yiming YANG, Zheng CHEN
Issue Date:May 2005
pp. 1106-1107
Taxonomies of the Web typically have hundreds of thousands of categories and skewed category distribution over documents. It is not clear whether existing text classification technologies can perform well on and scale up to such large-scale applications. T...
     
 1