Search For:

Displaying 1-23 out of 23 total
Cooperative Research on Web Data Management at UFMG and UFAM - A Brief Report
Found in: Web Congress, Latin American
By Alberto H. F. Laender, Altigran Soares da Silva
Issue Date:October 2008
pp. 144-150
The World Wide Web has become a huge repository of data of interest for a variety of application domains. However, the same features that have made the Web so useful and popular also impose important restrictions on the way the data it contains can be mani...
 
The Web-DL Environment for Building Digital Libraries from the Web
Found in: Digital Libraries, Joint Conference on
By Pável P. Calado, Marcos A. Gon¸alves, Edward A. Fox, Berthier Ribeiro-Neto, Alberto H. F. Laender, Altigran S. da Silva, Davi C. Reis, Pablo A. Roberto, Monique V. Vieira, Juliano P. Lage
Issue Date:May 2003
pp. 346
The Web contains a huge volume of unstructured data, which is difficult to manage. In digital libraries, on the other hand, information is explicitly organized, described, and managed. Community-oriented services are built to attend specific information ne...
   
The Web-DL Environment for Building Digital Libraries from the Web
Found in: Digital Libraries, Joint Conference on
By Pável P. Calado, Marcos A. Gonçalves, Edward A. Fox, Berthier Ribeiro-Neto, Alberto H. F. Laender, Altigran S. da Silva, Davi C. Reis, Pablo A. Roberto, Monique V. Vieira, Juliano P. Lage
Issue Date:May 2003
pp. 346
The Web contains a huge volume of unstructured data, which is difficult to manage. In digital libraries, on the other hand, information is explicitly organized, described, and managed. Community-oriented services are built to attend specific information ne...
   
The Effectiveness of Automatically Structured Queries in Digital Libraries
Found in: Digital Libraries, Joint Conference on
By Marcos André Gonçalves, Edward A. Fox, Aaron Krowne, Pável Calado, Alberto H. F. Laender, Altigran S. da Silva, Berthier Ribeiro-Neto
Issue Date:June 2004
pp. 98-107
Structured or fielded metadata is the basis for many digital library services, including searching and browsing. Yet, little is known about the impact of using structure on the effectiveness of such services. In this paper, we investigate a key research qu...
   
CoBWeb ? A Crawler for the Brazilian Web
Found in: String Processing and Information Retrieval, International Symposium on
By Altigran S. da Silva, Eveline A. Veloso, Paulo B. Golghe, Berthier Ribeiro-Neto, Alberto H. F. Laender, Nivio Ziviani
Issue Date:September 1999
pp. 184
One of the key components of current Web search engines is the document collector. This paper describes CoBWeb, an automatic document collector, whose architecture is distributed and highly scalable. CoBWeb aims at collecting large amounts of documents per...
 
Characterizing a Synthetic Workload for Performance Evaluation during the Migration of a Legacy System
Found in: Software Maintenance and Reengineering, European Conference on
By Paulo Pinheiro da Silva, Alberto H. F. Laender, Rodolfo S.F. Resende, Paulo B. Golgher
Issue Date:March 2000
pp. 173
This paper describes the characterization of a synthetic workload for performance evaluation of a new system before replacing a legacy system. The workload is used by CAPPLES, a capacity planning and performance analysis method for the migration of legacy ...
 
The Role of Gazetteers in Geographic Knowledge Discovery on the Web
Found in: Web Congress, Latin American
By Ligiane A. Souza, Clodoveu A. Davis Jr., Karla A. V. Borges, Tiago M. Delboni, Alberto H. F. Laender
Issue Date:November 2005
pp. 157-165
The Web is a large source of geographic information. Many Web documents have one or more spatial references, such as place names, addresses, zip codes or phone numbers. These spatial references are usually found in a semistructured fashion, which allows hu...
 
Learning to deduplicate
Found in: Digital Libraries, Joint Conference on
By Alberto H. F. Laender, Altigran S. da Silva, Marcos André Gonçalves, Moisés G. de Carvalho
Issue Date:June 2006
pp. 41-50
Identifying record replicas in Digital Libraries and other types of digital repositories is fundamental to improve the quality of their content and services as well as to yield eventual sharing efforts. Several deduplication strategies are available, but m...
 
A usability evaluation study of a digital library self-archiving service
Found in: Digital Libraries, Joint Conference on
By Alberto H. F. Laender, Marcos André Gonçalves, Lena Veiga e Silva
Issue Date:June 2005
pp. 176-177
In this paper 1, we describe an evaluation study of a self-archiving service for the Brazilian Digital Library of Computing (BDBComp). We conducted an extensive usability experiment with several potential users, including graduate students, professors, and...
 
BDBComp: Building a Digital Library for the Brazilian Computer Science Community
Found in: Digital Libraries, Joint Conference on
By Alberto H. F. Laender, Marcos André Gonçalves, Pablo A. Roberto
Issue Date:June 2004
pp. 23-24
This paper reports initial efforts towards building BDBComp, a digital library for the Brazilian computer science community. BDBComp is based on a number of standards (e.g., OAI, Dublin Core, SQL) as well as on new technologies (e.g., Web data extraction t...
   
Top-Down Extraction of Semi-Structured Data
Found in: String Processing and Information Retrieval, International Symposium on
By Berthier Ribeiro-Neto, Alberto H. F. Laender, Altigran S. da Silva
Issue Date:September 1999
pp. 176
In this paper, we propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. We propose a ...
 
A Genetic Programming Approach to Record Deduplication
Found in: IEEE Transactions on Knowledge and Data Engineering
By Moisés G. de Carvalho, Alberto H. F. Laender, Marcos André Goncalves, Altigran S. da Silva
Publication Date: November 2010
pp. N/A
Several systems that rely on consistent data to offer high quality services, such as digital libraries and e-commerce brokers, may be affected by the existence of duplicates, quasi-replicas, or near-duplicate entries in their repositories. Because of that,...
 
Using web information for creating publication venue authority files
Found in: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries (JCDL '08)
By Alberto H. F. Laender, Berthier Ribeiro-Neto, Denilson Alves Pereira, Nivio Ziviani
Issue Date:June 2008
pp. 597-617
Citations to publication venues in the form of journal, conference and workshop contain spelling variants, acronyms, abbreviated forms and misspellings, all of which make more difficult to retrieve the item of interest. The task of discovering and reconcil...
     
Discovering geographic locations in web pages using urban addresses
Found in: Proceedings of the 4th ACM workshop on Geographical information retrieval (GIR '07)
By Alberto H. F. Laender
Issue Date:November 2007
pp. 31-36
This paper presents an approach that helps to discover geographic locations from the recognition, extraction, and geocoding of urban addresses found in Web pages. Experiments that evaluate the presence and incidence of urban addresses in Web pages are desc...
     
Learning to deduplicate
Found in: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (JCDL '06)
By Alberto H. F. Laender, Altigran S. da Silva, Marcos Andre Goncalves, Moises G. de Carvalho
Issue Date:June 2006
pp. 41-50
Identifying record replicas in Digital Libraries and other types of digital repositories is fundamental to improve the quality of their content and services as well as to yield eventual sharing efforts. Several deduplication strategies are available, but m...
     
A usability evaluation study of a digital library self-archiving service
Found in: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries (JCDL '05)
By Alberto H. F. Laender, Lena Veiga e Silva, Marcos Andre Goncalves
Issue Date:June 2005
pp. 176-177
In this paper 1, we describe an evaluation study of a self-archiving service for the Brazilian Digital Library of Computing (BDBComp). We conducted an extensive usability experiment with several potential users, including graduate students, professors, and...
     
Collecting hidden weeb pages for data extraction
Found in: Proceedings of the fourth international workshop on Web information and data management (WIDM '02)
By Alberto H. F. Laender, Altigran S. da Silva, Juliano Palmieri Lage, Paulo B. Golgher
Issue Date:November 2002
pp. 69-75
As the Web grows, more and more data has become available under dynamic forms of publication, such as a legacy database accessed by an HTML form (the so called Hidden Web). In situations such as this, integration of this data relies more and more on the fa...
     
Structuring keyword-based queries for web databases
Found in: Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries (JCDL '02)
By Alberto H. F. Laender, Altigran S. da Silva, Berthier A. Ribeiro-Neto, Pavel Calado, Rodrigo C. Vieira
Issue Date:July 2002
pp. 94-95
This paper describes a framework, based on Bayesian belief networks, for querying Web databases using keywords only. According to this framework, the user inputs a query through a simple search-box. From the input query, one or more plausible structured qu...
     
Bootstrapping for example-based data extraction
Found in: Proceedings of the tenth international conference on Information and knowledge management (CIKM'01)
By Alberto H. F. Laender, Altigran S. da Silva, Berthier Ribeiro-Neto, Paulo B. Golgher
Issue Date:October 2001
pp. 371-378
The effortless generation of wrappers for Web data sources is a crucial task if proper access to the huge amount of semi-structured data on the Web is to be granted. In particular, the development of strategies for wrapper generation based on user-given ex...
     
Multiple representations in GIS: materialization through map generalization, geometric, and spatial analysis operations
Found in: Proceedings of the seventh ACM international symposium on Advances in geographic information systems (GIS '99)
By Alberto H. F. Laender, Clodoveu A. Davis
Issue Date:November 1999
pp. 60-65
The low operating speed of current CMOS Field Programmable Gate Arrays (FPGAs), i.e., 10-220 MHz, has prevented their use in high-speed digital applications. With the advent of IBM Silicon Germanium (SiGe) 7HP technology, designers have been able to design...
     
Spatial data integrity constraints in object oriented geographic data modeling
Found in: Proceedings of the seventh ACM international symposium on Advances in geographic information systems (GIS '99)
By Alberto H. F. Laender, Clodoveu A. Davis, Karla A. V. Borges
Issue Date:November 1999
pp. 1-6
The low operating speed of current CMOS Field Programmable Gate Arrays (FPGAs), i.e., 10-220 MHz, has prevented their use in high-speed digital applications. With the advent of IBM Silicon Germanium (SiGe) 7HP technology, designers have been able to design...
     
Extracting semi-structured data through examples
Found in: Proceedings of the eighth international conference on Information and knowledge management (CIKM '99)
By Alberto H. F. Laender, Altigran S. da Silva, Berthier Ribeiro-Neto
Issue Date:November 1999
pp. 94-101
In this paper, we describe an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. To perform t...
     
A hierarchical approach to the automatic categorization of medical documents
Found in: Proceedings of the seventh international conference on Information and knowledge management (CIKM '98)
By Alberto H. F. Laender, Berthier A. Ribeiro-Neto, Luciano R. S. de Lima
Issue Date:November 1998
pp. 132-139
This paper focuses on the consistency issues related to integrating multiple sets of spatial data in spatial information systems such as Geographic Information Systems (GISs). Data sets to be integrated are assumed to hold information about the same geogra...
     
 1