Search For:

Displaying 1-23 out of 23 total
The Web-DL Environment for Building Digital Libraries from the Web
Found in: Digital Libraries, Joint Conference on
By Pável P. Calado, Marcos A. Gon¸alves, Edward A. Fox, Berthier Ribeiro-Neto, Alberto H. F. Laender, Altigran S. da Silva, Davi C. Reis, Pablo A. Roberto, Monique V. Vieira, Juliano P. Lage
Issue Date:May 2003
pp. 346
The Web contains a huge volume of unstructured data, which is difficult to manage. In digital libraries, on the other hand, information is explicitly organized, described, and managed. Community-oriented services are built to attend specific information ne...
   
The Web-DL Environment for Building Digital Libraries from the Web
Found in: Digital Libraries, Joint Conference on
By Pável P. Calado, Marcos A. Gonçalves, Edward A. Fox, Berthier Ribeiro-Neto, Alberto H. F. Laender, Altigran S. da Silva, Davi C. Reis, Pablo A. Roberto, Monique V. Vieira, Juliano P. Lage
Issue Date:May 2003
pp. 346
The Web contains a huge volume of unstructured data, which is difficult to manage. In digital libraries, on the other hand, information is explicitly organized, described, and managed. Community-oriented services are built to attend specific information ne...
   
The Effectiveness of Automatically Structured Queries in Digital Libraries
Found in: Digital Libraries, Joint Conference on
By Marcos André Gonçalves, Edward A. Fox, Aaron Krowne, Pável Calado, Alberto H. F. Laender, Altigran S. da Silva, Berthier Ribeiro-Neto
Issue Date:June 2004
pp. 98-107
Structured or fielded metadata is the basis for many digital library services, including searching and browsing. Yet, little is known about the impact of using structure on the effectiveness of such services. In this paper, we investigate a key research qu...
   
The Debye Environment for Web Data Management
Found in: IEEE Internet Computing
By Alberto H.F. Laender, Altigran S. da Silva, Paolo B. Golgher, Berthier Ribeiro-Neto, Irna M.R. Evangelista-Filha, Karine V. Magalhães
Issue Date:July 2002
pp. 60-69
<p>Currently, the Web contains a large amount of interesting data implicitlyavailable on pages at various sites, including digital libraries and on-line stores. Researchers regard these data-rich pages as
 
Learning to deduplicate
Found in: Digital Libraries, Joint Conference on
By Alberto H. F. Laender, Altigran S. da Silva, Marcos André Gonçalves, Moisés G. de Carvalho
Issue Date:June 2006
pp. 41-50
Identifying record replicas in Digital Libraries and other types of digital repositories is fundamental to improve the quality of their content and services as well as to yield eventual sharing efforts. Several deduplication strategies are available, but m...
 
CoBWeb ? A Crawler for the Brazilian Web
Found in: String Processing and Information Retrieval, International Symposium on
By Altigran S. da Silva, Eveline A. Veloso, Paulo B. Golghe, Berthier Ribeiro-Neto, Alberto H. F. Laender, Nivio Ziviani
Issue Date:September 1999
pp. 184
One of the key components of current Web search engines is the document collector. This paper describes CoBWeb, an automatic document collector, whose architecture is distributed and highly scalable. CoBWeb aims at collecting large amounts of documents per...
 
Top-Down Extraction of Semi-Structured Data
Found in: String Processing and Information Retrieval, International Symposium on
By Berthier Ribeiro-Neto, Alberto H. F. Laender, Altigran S. da Silva
Issue Date:September 1999
pp. 176
In this paper, we propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. We propose a ...
 
A Genetic Programming Approach to Record Deduplication
Found in: IEEE Transactions on Knowledge and Data Engineering
By Moisés G. de Carvalho, Alberto H. F. Laender, Marcos André Goncalves, Altigran S. da Silva
Publication Date: November 2010
pp. N/A
Several systems that rely on consistent data to offer high quality services, such as digital libraries and e-commerce brokers, may be affected by the existence of duplicates, quasi-replicas, or near-duplicate entries in their repositories. Because of that,...
 
Building a research social network from an individual perspective
Found in: Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries (JCDL '11)
By Alberto H.F. Laender, Allan J.C. Silva, Altigran S. da Silva, Carolina A.S. Bigonha, Clodoveu A. Davis, Daniel Hasan Dalip, Eduardo M. Barbosa, Eli Cortez, Marcos Andre Goncalves, Mirella M. Moro, Peterson S. Procopio, Rafael Odon de A
Issue Date:June 2011
pp. 427-428
In this poster paper, we present an overview of CiênciaBrasil, a research social network involving researchers within the Brazilian INCT program. We describe its architecture and the solutions adopted for data collection, extraction, and deduplication...
     
Fast document-at-a-time query processing using two-tier indexes
Found in: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '13)
By Altigran S. da Silva, Andre L. Carvalho, Cristian Rossi, Edleno S. de Moura
Issue Date:July 2013
pp. 183-192
In this paper we present two new algorithms designed to reduce the overall time required to process top-k queries. These algorithms are based on the document-at-a-time approach and modify the best baseline we found in the literature, Blockmax WAND (BMW), t...
     
Joint unsupervised structure discovery and information extraction
Found in: Proceedings of the 2011 international conference on Management of data (SIGMOD '11)
By Alberto H.F. Laender, Altigran S. da Silva, Daniel Oliveira, Edleno S. de Moura, Eli Cortez
Issue Date:June 2011
pp. 541-552
In this paper we present JUDIE (Joint Unsupervised Structure Discovery and Information Extraction), a new method for automatically extracting semi-structured data records in the form of continuous text (e.g., bibliographic citations, postal addresses, clas...
     
Multiple keyword-based queries over XML streams
Found in: Proceedings of the 20th ACM international conference on Information and knowledge management (CIKM '11)
By Alberto H.F. Laender, Altigran S. da Silva, Felipe C. Hummel, Mirella M. Moro
Issue Date:October 2011
pp. 1577-1582
In this paper, we propose that various keyword-based queries be processed over XML streams in a multi-query processing way. Our algorithms rely on parsing stacks designed for simultaneously matching terms from several distinct queries and use new query ind...
     
A source independent framework for research paper recommendation
Found in: Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries (JCDL '11)
By Alberto H.F. Laender, Altigran S. da Silva, Cristiano Nascimento, Marcos Andre Goncalves
Issue Date:June 2011
pp. 297-306
As the number of research papers available on the Web has increased enormously over the years, paper recommender systems have been proposed to help researchers on automatically finding works of interest. The main problem with the current approaches is that...
     
Automatically filling form-based web interfaces with free text inputs
Found in: Proceedings of the 18th international conference on World wide web (WWW '09)
By Altigran S. da Silva, Edleno Moura, Eli Cortez, Filipe Mesquita, Guilherme A. Toda, Marden Neubert
Issue Date:April 2009
pp. 66-66
On the web of today the most prevalent solution for users to interact with data-intensive applications is the use of form-based interfaces composed by several data input fields, such as text boxes, radio buttons, pull-down lists, check boxes, etc. Although...
     
Replica identification using genetic programming
Found in: Proceedings of the 2008 ACM symposium on Applied computing (SAC '08)
By Albero H. F. Laender, Altigran S. da Silva, Marcos Andre Goncalves, Moises G. Carvalho
Issue Date:March 2008
pp. 28-34
Identifying and handling replicas are important to guarantee the quality of the information made available by modern data storage services. There has been a large investment from companies and governments in the development of effective methods for removin...
     
A strategy for allowing meaningful and comparable scores in approximate matching
Found in: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (CIKM '07)
By Altigran S. da Silva
Issue Date:November 2007
pp. 303-312
The goal of approximate data matching is to assess whether two distinct data instances represent the same real world object. This is usually achieved through the use of a similarity function, which returns a score that defines how similar two data instance...
     
Computing block importance for searching on web sites
Found in: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (CIKM '07)
By Altigran S. da Silva
Issue Date:November 2007
pp. 165-174
In this paper we consider the problem of using the block structure of a Web page to improve ranking results when searching for information on Web sites. Given the block structure of the Web pages as input, we propose a method for computing the importance o...
     
FleDEx: flexible data exchange
Found in: Proceedings of the 9th annual ACM international workshop on Web information and data management (WIDM '07)
By Altigran S. da Silva
Issue Date:November 2007
pp. 25-32
We propose a lightweight framework for data exchange that is suitable for non-expert and casual users sharing data on the Web or through peer-to-peer systems. Unlike previous work, we consider a simplistic data model and schema formalism that are suitable ...
     
Learning to deduplicate
Found in: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (JCDL '06)
By Alberto H. F. Laender, Altigran S. da Silva, Marcos Andre Goncalves, Moises G. de Carvalho
Issue Date:June 2006
pp. 41-50
Identifying record replicas in Digital Libraries and other types of digital repositories is fundamental to improve the quality of their content and services as well as to yield eventual sharing efforts. Several deduplication strategies are available, but m...
     
Collecting hidden weeb pages for data extraction
Found in: Proceedings of the fourth international workshop on Web information and data management (WIDM '02)
By Alberto H. F. Laender, Altigran S. da Silva, Juliano Palmieri Lage, Paulo B. Golgher
Issue Date:November 2002
pp. 69-75
As the Web grows, more and more data has become available under dynamic forms of publication, such as a legacy database accessed by an HTML form (the so called Hidden Web). In situations such as this, integration of this data relies more and more on the fa...
     
Structuring keyword-based queries for web databases
Found in: Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries (JCDL '02)
By Alberto H. F. Laender, Altigran S. da Silva, Berthier A. Ribeiro-Neto, Pavel Calado, Rodrigo C. Vieira
Issue Date:July 2002
pp. 94-95
This paper describes a framework, based on Bayesian belief networks, for querying Web databases using keywords only. According to this framework, the user inputs a query through a simple search-box. From the input query, one or more plausible structured qu...
     
Bootstrapping for example-based data extraction
Found in: Proceedings of the tenth international conference on Information and knowledge management (CIKM'01)
By Alberto H. F. Laender, Altigran S. da Silva, Berthier Ribeiro-Neto, Paulo B. Golgher
Issue Date:October 2001
pp. 371-378
The effortless generation of wrappers for Web data sources is a crucial task if proper access to the huge amount of semi-structured data on the Web is to be granted. In particular, the development of strategies for wrapper generation based on user-given ex...
     
Extracting semi-structured data through examples
Found in: Proceedings of the eighth international conference on Information and knowledge management (CIKM '99)
By Alberto H. F. Laender, Altigran S. da Silva, Berthier Ribeiro-Neto
Issue Date:November 1999
pp. 94-101
In this paper, we describe an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. To perform t...
     
 1