Search For:

Displaying 1-50 out of 146 total
Name disambiguation in author citations using a K-way spectral clustering method
Found in: Digital Libraries, Joint Conference on
By C. Lee Giles, Hongyuan Zha, Hui Han
Issue Date:June 2005
pp. 334-343
An author may have multiple names and multiple authors may share the same name simply due to name abbreviations, identical names, or name misspellings in publications or bibliographies 1. This can produce name ambiguity which can affect the performance of ...
 
Improving Category Specific Web Search by Learning Query Modifications
Found in: Applications and the Internet, IEEE/IPSJ International Symposium on
By Eric J. Glover, Gary W. Flake, Steve Lawrence, Andries Kruger, David M. Pennock, William P. Birmingham, C. Lee Giles
Issue Date:January 2001
pp. 23
Abstract Users looking for documents within specific categories may have a difficult time locating valuable documents using general purpose search engines. We present an automated method for learning query modifications that can dramatically improve precis...
 
Rule Revision With Recurrent Neural Networks
Found in: IEEE Transactions on Knowledge and Data Engineering
By Christian W. Omlin, C. Lee Giles
Issue Date:February 1996
pp. 183-188
<p><b>Abstract</b>-Recurrent neural networks readily process, recognize and generate temporal sequences. By encoding grammatical strings as temporal sequences, recurrent neural networks can be trained to behave like deterministic sequenti...
 
Personalized Feed Recommendation Service for Social Networks
Found in: Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust, 2010 IEEE International Conference on
By Huajing Li, Yuan Tian, Wang-Chien Lee, C. Lee Giles, Meng-Chang Chen
Issue Date:August 2010
pp. 96-103
Social network systems (SNSs) such as Facebook and Twitter have recently attracted millions of users by providing social network based services to support easy message posting, information sharing and inter-friend communication. With the rapid growth of so...
 
Learning metadata from the evidence in an on-line citation matching scheme
Found in: Digital Libraries, Joint Conference on
By Anand Sivasubramaniam, Sandip Debnath, Huajing Li, Wang Chien Lee, Levent Bolelli, C. Lee Giles, Ziming Zhuang, Isaac G. Councill
Issue Date:June 2006
pp. 276-285
Citation matching, or the automatic grouping of bibliographic references that refer to the same document, is a data management problem faced by automatic digital libraries for scientific literature such as CiteSeer and Google Scholar. Although several solu...
 
Discovering Temporal Communities from Social Network Documents
Found in: Data Mining, IEEE International Conference on
By Ding Zhou, Isaac Councill, Hongyuan Zha, C. Lee Giles
Issue Date:October 2007
pp. 745-750
This paper studies the discovery of communities from social network documents produced over time, addressing the discovery of temporal trends in community memberships. We first formulate static community discovery at a single time period as a tripartite gr...
 
Automatic Document Metadata Extraction Using Support Vector Machines
Found in: Digital Libraries, Joint Conference on
By Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, Edward A. Fox
Issue Date:May 2003
pp. 37
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classification-based metho...
 
Natural Language Grammatical Inference with Recurrent Neural Networks
Found in: IEEE Transactions on Knowledge and Data Engineering
By Steve Lawrence, C. Lee Giles, Sandiway Fong
Issue Date:January 2000
pp. 126-140
<p><b>Abstract</b>—This paper examines the inductive inference of a complex grammar with neural networks—specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatica...
 
The Ethicality of Web Crawlers
Found in: Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on
By Yang Sun, Isaac G. Councill, C. Lee Giles
Issue Date:September 2010
pp. 668-675
Search engines largely rely on web crawlers to collect information from the web. This has led to an enormous amount of web traffic generated by crawlers alone. To minimize negative aspects of this traffic on websites, the behaviors of crawlers may be regul...
 
A Non-parametric Approach to Pair-Wise Dynamic Topic Correlation Detection
Found in: Data Mining, IEEE International Conference on
By Yang Song, Lu Zhang, C. Lee Giles
Issue Date:December 2008
pp. 1031-1036
We introduce dynamic correlated topic models (DCTM) for analyzing discrete data over time. This model is inspired by the hierarchical Gaussian process latent variable models (GP-LVM). DCTM is essentially a non-linear dimension reduction technique which is ...
 
Iterative Graph Feature Mining for Graph Indexing
Found in: Data Engineering, International Conference on
By Dayu Yuan,Prasenjit Mitra,Huiwen Yu,C. Lee Giles
Issue Date:April 2012
pp. 198-209
Sub graph search is a popular query scenario on graph databases. Given a query graph q, the sub graph search algorithm returns all database graphs having q as a sub graph. To efficiently implement a subgraph search, subgraph features are mined in order to ...
 
Towards Click-Based Models of Geographic Interests in Web Search
Found in: Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conference on
By Ziming Zhuang, Cliff Brunk, Prasenjit Mitra, C. Lee Giles
Issue Date:December 2008
pp. 293-299
With the recent surge in the volume of search queries that explicitly or implicitly express users' geographical interests, to accurately infer users' locality preference becomes an increasingly important yet challenging issue. We study two click-based mode...
 
Extracting Author Meta-Data from Web Using Visual Features
Found in: Data Mining Workshops, International Conference on
By Shuyi Zheng, Ding Zhou, Jia Li, C. Lee Giles
Issue Date:October 2007
pp. 33-40
Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classi- fication problem. A homepag...
 
Supporting distributed scientific collaboration: Implications for designing the CiteSeer collaboratory
Found in: Hawaii International Conference on System Sciences
By Umer Farooq, Craig H. Ganoe, John M. Carroll, C. Lee Giles
Issue Date:January 2007
pp. 26c
It is unclear if and how collaboratories have enhanced distributed scientific collaboration. Furthermore, little is known in the way of design strategies to support such collaboration. Based on a survey and follow-up interviews with CiteSeer users, we pres...
   
What's there and what's not?: focused crawling for missing documents in digital libraries
Found in: Digital Libraries, Joint Conference on
By Rohit Wagle, C. Lee Giles, Ziming Zhuang
Issue Date:June 2005
pp. 301-310
Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors' self-submissions. While these approaches have so far built reasonable size libra...
 
Collaborative Filtering with Maximum Entropy
Found in: IEEE Intelligent Systems
By Dmitry Pavlov, Eren Manavoglu, David M. Pennock, C. Lee Giles
Issue Date:November 2004
pp. 40-48
The authors describe a novel maximum-entropy (maxent) approach for generating online recommendations as a user navigates through a collection of documents. They show how to handle high-dimensional sparse data and represent it as a collection of ordered seq...
 
Cloud Computing: A Digital Libraries Perspective
Found in: Cloud Computing, IEEE International Conference on
By Pradeep Teregowda, Bhuvan Urgaonkar, C. Lee Giles
Issue Date:July 2010
pp. 115-122
Provisioning and maintenance of infrastructure for Web based digital library search engines such as CiteSeer$^x$ present several challenges. CiteSeer$^x$ provides autonomous citation indexing, full text indexing, and extensive document metadata from docume...
 
HSN-PAM: Finding Hierarchical Probabilistic Groups from Large-Scale Networks
Found in: Data Mining Workshops, International Conference on
By Haizheng Zhang, Wei Li, Xuerui Wang, C. Lee Giles, Henry C. Foley, John Yen
Issue Date:October 2007
pp. 27-32
Real-world social networks are often hierarchical, re- flecting the fact that some communities are composed of a few smaller, sub-communities. This paper describes a hierarchical Bayesian model based scheme, namely HSN- PAM (Hierarchical Social Network-Pac...
 
Clustering and Identifying Temporal Trends in Document Databases
Found in: Advances in Digital Libraries Conference, IEEE
By Alexandrin Popescul, Lyle H. Ungar, Gary William Flake, Steve Lawrence, C. Lee Giles
Issue Date:May 2000
pp. 173
We introduce a simple and efficient method for clustering and identifying temporal trends in hyper-linked document databases. Our method can scale to large datasets because it exploits the underlying regularity often found in hyper-linked document database...
 
Figure Metadata Extraction from Digital Documents
Found in: 2013 12th International Conference on Document Analysis and Recognition (ICDAR)
By Sagnik Ray Choudhury,Prasenjit Mitra,Andi Kirk,Silvia Szep,Donald Pellegrino,Sue Jones,C. Lee Giles
Issue Date:August 2013
pp. 135-139
Academic papers contain multiple figures (information graphics) representing important findings and experimental results. Automatic data extraction from such figures and classification of information graphics is not straightforward and a well studied probl...
 
Improving the Table Boundary Detection in PDFs by Fixing the Sequence Error of the Sparse Lines
Found in: Document Analysis and Recognition, International Conference on
By Ying Liu, Kun Bai, Prasenjit Mitra, C. Lee Giles
Issue Date:July 2009
pp. 1006-1010
As the rapid growth of PDF documents, recognizing the document structure and components are useful for document storage, classification and retrieval. Table, a ubiquitous document component, becomes an important information source. Accurately detecting the...
 
Feature Selection in Web Applications By ROC Inflections and Powerset Pruning
Found in: Applications and the Internet, IEEE/IPSJ International Symposium on
By Frans M. Coetzee, Eric Glover, Steve Lawrence, C. Lee Giles
Issue Date:January 2001
pp. 5
A basic problem of information processing is selecting enough features to ensure that events are accurately represented for classification problems, while simultaneously minimizing storage and processing of irrelevant or marginally important features. To a...
 
Automatic categorization of figures in scientific documents
Found in: Digital Libraries, Joint Conference on
By Prasenjit Mitra, C. Lee Giles, James Z. Wang, Xiaonan Lu
Issue Date:June 2006
pp. 129-138
Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose an architecture for retrievin...
 
Discovering Relevant Scientific Literature on the Web
Found in: IEEE Intelligent Systems
By Kurt D. Bollacker, Steve Lawrence, C. Lee Giles
Issue Date:March 2000
pp. 42-47
Because of the ease of electronic dissemination, the world of scientific literature on the Web has grown rapidly, becoming a large, highly current database of published research. This acceleration of publication has exacerbated the difficulty researchers f...
 
Automatic Detection of Pseudocodes in Scholarly Documents Using Machine Learning
Found in: 2013 12th International Conference on Document Analysis and Recognition (ICDAR)
By Suppawong Tuarob,Sumit Bhatia,Prasenjit Mitra,C. Lee Giles
Issue Date:August 2013
pp. 738-742
A significant number of scholarly articles in computer science and other disciplines contain algorithms that provide concise descriptions for solving a wide variety of computational problems. For example, Dijkstra's algorithm describes how to find the shor...
 
Table of Contents Recognition and Extraction for Heterogeneous Book Documents
Found in: 2013 12th International Conference on Document Analysis and Recognition (ICDAR)
By Zhaohui Wu,Prasenjit Mitra,C. Lee Giles
Issue Date:August 2013
pp. 1205-1209
Existing work on book table of contents (TOC) recognition has been almost all on small size, application-dependent, and domain-specific datasets. However, TOC of books from different domains differ significantly in their visual layout and style, making TOC...
 
Nonconvex Online Support Vector Machines
Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence
By Şeyda Ertekin, Léon Bottou, C. Lee Giles
Issue Date:February 2011
pp. 368-381
In this paper, we propose a nonconvex online Support Vector Machine (SVM) algorithm (LASVM-NC) based on the Ramp Loss, which has the strong ability of suppressing the influence of outliers. Then, again in the online learning setting, we propose an outlier ...
 
Panorama: Extending Digital Libraries with Topical Crawlers
Found in: Digital Libraries, Joint Conference on
By Gautam Pant, Kostas Tsioutsiouliklis, Judy Johnson, C. Lee Giles
Issue Date:June 2004
pp. 142-150
A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of i...
 
Self-Organization and Identification of Web Communities
Found in: Computer
By Gary William Flake, Steve Lawrence, C. Lee Giles, Frans M. Coetzee
Issue Date:March 2002
pp. 66-71
<p>Millions of individuals operating independently author the Web's information. Despite its decentralized nature, the authors' work shows that the Web self-organizes and its link structure allows efficient identification of communities. This is sign...
 
Sequence Learning: From Recognition and Prediction to Sequential Decision Making
Found in: IEEE Intelligent Systems
By Ron Sun, C. Lee Giles
Issue Date:July 2001
pp. 67-70
No summary available.
 
Overfitting and Neural Networks: Conjugate Gradient and Backpropagation
Found in: Neural Networks, IEEE - INNS - ENNS International Joint Conference on
By Steve Lawrence, C. Lee Giles
Issue Date:July 2000
pp. 1114
Methods for controlling the bias/variance tradeoff typically assume that overfitting or over training is a global phenomenon. For multi-layer perceptron (MLP) neural networks, global parameters such as the training time (e.g. based on validation tests), ne...
 
Self-Adaptive User Profiles for Large-Scale Data Delivery
Found in: Data Engineering, International Conference on
By Ugur Cetintemel, Michael J. Franklin, C. Lee Giles
Issue Date:March 2000
pp. 622
Push-based data delivery requires knowledge of user interests for making scheduling, bandwidth allocation, and routing decisions. Such information is maintained as user profiles. We propose a new incremental algorithm for constructing user profiles based o...
 
K-SVMeans: A Hybrid Clustering Algorithm for Multi-Type Interrelated Datasets
Found in: Web Intelligence, IEEE / WIC / ACM International Conference on
By Levent Bolelli, Seyda Ertekin, Ding Zhou, C. Lee Giles
Issue Date:November 2007
pp. 198-204
Identification of distinct clusters of documents in text collections has traditionally been addressed by making the assumption that the data instances can only be represented by homogeneous and uniform features. Many real-world data, on the other hand, com...
 
Query Expansion Using Topic and Location
Found in: Data Mining Workshops, International Conference on
By Shu Huang, Qiankun Zhao, Prasenjit Mitra, C. Lee Giles
Issue Date:October 2007
pp. 619-624
Users use a few keywords to post queries to search engines. Search engines, often, fail to return answers that their users seek because the keyword queries incompletely specify the information being sought and because of the ambiguity of natural language t...
 
Context and Page Analysis for Improved Web Search
Found in: IEEE Internet Computing
By Steve Lawrence, C. Lee Giles
Issue Date:July 1998
pp. 38-46
<p>NECI Research Institute has developed a metasearch engine that improves the efficiency of Web searches by downloading and analyzing each document and then displaying results that show the query terms in context.</p>
 
A Fast Preprocessing Method for Table Boundary Detection: Narrowing Down the Sparse Lines Using Solely Coordinate Information
Found in: Document Analysis Systems, IAPR International Workshop on
By Ying Liu, Prasenjit Mitra, C. Lee Giles
Issue Date:September 2008
pp. 431-438
As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classification and retrieval. Tables, as a specific document component, are ubiquitous ...
 
BotSeer: An Automated Information System for Analyzing Web Robots
Found in: Web Engineering, International Conference on
By Yang Sun, Isaac G. Councill, C. Lee Giles
Issue Date:July 2008
pp. 108-114
Robots.txt files are vital to the web since they are supposed to regulate what search engines can and cannot crawl. We present BotSeer, a Web-based information system and search tool that provides resources and services for researching Web robots and trend...
 
Determining Bias to Search Engines from Robots.txt
Found in: Web Intelligence, IEEE / WIC / ACM International Conference on
By Yang Sun, Ziming Zhuang, Isaac G. Councill, C. Lee Giles
Issue Date:November 2007
pp. 149-155
Search engines largely rely on robots (i.e., crawlers or spiders) to collect information from the Web. Such crawling activities can be regulated from the server side by deploying the Robots Exclusion Protocol in a file called robots.txt. Ethical robots wil...
 
Social Bookmarking for Scholarly Digital Libraries
Found in: IEEE Internet Computing
By Umer Farooq, Yang Song, John M. Carroll, C. Lee Giles
Issue Date:November 2007
pp. 29-35
Social bookmarking services have recently gained popularity among Web users. Whereas numerous studies provide a historical account of tagging systems, the authors use their analysis of a domain-specific social bookmarking service called CiteULike to reflec...
 
Co-ranking Authors and Documents in a Heterogeneous Network
Found in: Data Mining, IEEE International Conference on
By Ding Zhou, Sergey A. Orshanskiy, Hongyuan Zha, C. Lee Giles
Issue Date:October 2007
pp. 739-744
Recent graph-theoretic approaches have demonstrated remarkable successes for ranking networked entities, but most of their applications are limited to homogeneous networks such as the network of citations between publications. This paper proposes a novel m...
 
Intelligent Parsing of Scanned Volumes for Web Based Archives
Found in: International Conference on Semantic Computing
By Xiaonan Lu, James Z. Wang, C. Lee Giles
Issue Date:September 2007
pp. 559-568
The proliferation of digital libraries and the large amount of existing documents raise important issues in efficient handling of documents. Printed texts in documents need to be converted into digital format and semantic information need to be parsed and ...
 
Boosting the Feature Space: Text Classification for Unstructured Data on the Web
Found in: Data Mining, IEEE International Conference on
By Yang Song, Ding Zhou, Jian Huang, Isaac G. Councill, Hongyuan Zha, C. Lee Giles
Issue Date:December 2006
pp. 1064-1069
The issue of seeking efficient and effective methods for classifying unstructured text in large document corpora has received much attention in recent years. Traditional document representation like bag-of-words encodes documents as feature vectors, which ...
 
Automatic extraction of table metadata from digital documents
Found in: Digital Libraries, Joint Conference on
By Kun Bai, Prasenjit Mitra, C. Lee Giles, Ying Liu
Issue Date:June 2006
pp. 339-340
Tables are used to present, list, summarize, and structure important data in documents. In scholarly articles, they are often used to present the relationships among data and high-light a collection of results obtained from experiments and scientific analy...
 
Automatic Identification of Informative Sections of Web Pages
Found in: IEEE Transactions on Knowledge and Data Engineering
By Sandip Debnath, Prasenjit Mitra, Nirmal Pal, C. Lee Giles
Issue Date:September 2005
pp. 1233-1246
Web pages—especially dynamically generated ones—contain several items that cannot be classified as the
 
Enabling Interoperability For Autonomous Digital Libraries: An API To CiteSeer Services
Found in: Digital Libraries, Joint Conference on
By Yves Petinot, C. Lee Giles, Vivek Bhatnagar, Pradeep B. Teregowda, Hui Han
Issue Date:June 2004
pp. 372-373
We introduce CiteSeer-API, a public API to CiteSeer-like services. CiteSeer-API is SOAP/WSDL based and allows for easy programatical access to all the specific functionalities offered by CiteSeer services, including full text search of documents and citati...
   
eBizSearch: An OAI-Compliant Digital Library for eBusiness
Found in: Digital Libraries, Joint Conference on
By Yves Petinot, Pradeep B. Teregowda, Hui Han, C. Lee Giles, Steve Lawrence, Arvind Rangaswamy, Nirmal Pal
Issue Date:May 2003
pp. 199
Niche Search Engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance and when nontraditional search features are required. Niche search...
 
Persistence of Web References in Scientific Research
Found in: Computer
By Steve Lawrence, David M. Pennock, Gary William Flake, Robert Krovetz, Frans M. Coetzee, Eric Glover, Finn Årup Nielsen, Andries Kruger, C. Lee Giles
Issue Date:February 2001
pp. 26-31
<p>Invalid URLs can lead to important data loss as cited works and research findings gradually disappear from circulation. The lack of persistence of Web references leads researchers to question whether publications should even include URL citations....
 
Digital Libraries and Autonomous Citation Indexing
Found in: Computer
By Steve Lawrence, C. Lee Giles, Kurt Bollacker
Issue Date:June 1999
pp. 67-71
<p>The revolution the Web has brought to information dissemination is not so much due to the availability of data--huge amounts of information has long been available in libraries--but rather the improved efficiency of accessing (improved accessibili...
 
Convolutional Neural Networks for Face Recognition
Found in: Computer Vision and Pattern Recognition, IEEE Computer Society Conference on
By Steve Lawrence, C. Lee Giles, Ah Chung Tsoi
Issue Date:June 1996
pp. 217
No summary available.
 
Scholarly big data information extraction and integration in the CiteSeer χ digital library
Found in: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW)
By Kyle Williams,Jian Wu,Sagnik Ray Choudhury,Madian Khabsa,C. Lee Giles
Issue Date:March 2014
pp. 68-73
CiteSeer χ is a digital library that contains approximately 3.5 million scholarly documents and receives between 2 and 4 million requests per day. In addition to making documents available via a public Website, the data is also used to facilitate research ...
   
 1  2 Next >>