, European Commission
, National University of Ireland, Galway
Pages: pp. 11-15
Abstract—Government data covers authoritative and valuable information about our society. Public access to government data, however, remains challenging largely due to the heterogeneity and complexity of the public information ecosystem which results in high costs for locating, decoding, inter-linking and reusing existing government data. Recently, linked data–based solutions have been adopted by the leading practitioners (such as Data.gov in the US and Data.gov.uk in the UK) to offer an open and incremental ecosystem that interconnects providers, consumers, and contributors of open government data. This article first reports a community consensus on the architecture of the linked open government data ecosystem, then reviews the key technologies reported by works included in this special issue, and finally concludes with three grand challenges towards opening, linking, and reusing government data.
Keywords—open government data, information management, linked data, Semantic Web, electronic government, semantics, government data processing, challenge, data-oriented architectures
This, 2009, is the year for putting government data online. Both US and UK governments made public commitments toward open data.—Tim Berners-Lee
Public-sector bodies produce and collect government data that records authoritative information about government activities (such as spending and service provision) and regional statistics (such as economic indicators). The emerging open government data (OGD) movement demands proactive release of government
data on the Web, free of charge and with minimal constraints on reuse. Key benefits of OGD include facilitating the reuse of government data, opening up new business opportunities, enhancing government transparency and citizen engagement, and distributing the cost of government data processing to communities.
Data.gov, the US national OGD portal ( www.data.gov), was launched in May 2009. A few months later, in January 2010, the British government launched Data.gov.uk ( http://data.gov.uk). The European Commission encourages OGD through the 2003 Public Sector Information Directive and the 2011 Open Data Package ( http://ec.europa.eu/information_society/policy/psi). As of January 2012, more than 700,000 OGD datasets have been put online by national and local governments from more than 30 countries ( http://logd.tw.rpi.edu/demo/international_dataset_catalog_search). One of the major challenges for OGD is the costly integration of government data across domains and political boundaries, because OGD datasets are published in various formats, use different vocabularies, and are accompanied by metadata of varying quality.
Linked open government data (LOGD), pioneered by Data.gov and Data.gov.uk, is emerging on the linked-data Web as a way of facilitating opening, linking, and reusing OGD. Linked data offers minimal consensus on data representation (using URIs and the Resource Description Framework) and data access (via HTTP), and it enables incremental OGD publishing according to Tim Berners-Lee's "5 Stars of Linked Open Data" ( http://5stardata.info). 1 LOGD is recognized as a Web-based open ecosystem that organically interconnects the original data owners (such as government agencies), data-processing service providers (such as entity resolution services), and data consumers (enterprises and citizens). 2Figure 1 shows a roadmap of LOGD with three data-processing stages:
Figure 1 Roadmap of linked open government data, based on community consensus from the 2011 AAAI Fall Symposium on Open Government Knowledge. The three data-processing stages, shown in green, enhance the raw open government data from data providers using the combination of machine power and human power and deliver higher-quality data to a wide range of data consumers via visualizations, mashups, and more.
LOGD represents a new data integration paradigm for sustainable growth of OGD and consequently can be considered a new enterprise integration application approach. First, it opens up the scope of data integration from traditionally closed enterprise environments such as data warehouses to the entire Web. Users can mash up government data with crowdsourced data, privately owned data, and many other types of nongovernmental data. Second, it enables a data-oriented architecture (DOA) that decouples complex data objects into reusable fine-grained linked data on the Web. A service-oriented architecture (SOA) decouples the services used by applications to make them reusable by other applications and systems; a DOA, by contrast, decouples the data to make it reusable. Applying this DOA principle on the Web means that anyone can contribute to LOGD deployment with partial but interlinked contributions, such as declarative mappings from US state names to the corresponding federal information-processing standards codes or a Web service that finds relevant DBpedia (dbpedia.org) entities for a name.
This special issue features reports from six countries contributed by key government practitioners and academic thought leaders from four continents:
Instead of covering every aspect of LOGD, we selected these articles to highlight the key challenges and lessons learned from its real-world deployment.
Although LOGD is perhaps the fastest-evolving part of the linked-data Web, most authors acknowledge a considerable entry barrier to producing LOGD. Open-source software tools have been developed and reused in facilitating cataloging and generating LOGD datasets. In particular, data portals such as the Comprehensive Knowledge Archive Network ( http://ckan.org) in the UK and the US-India Open Government Platform collaboration ( www.data.gov/opengovplatform) help in releasing more OGD datasets. In Brazil, triplification tools such as Triplify ( http://triplify.org) are helping generate LOGD from raw OGD datasets. In this way, newcomers can easily start their work by reusing contributions from the pioneers.
The need for linking data is well-understood, as in the effort to link public and research data in Australia. Solutions from the UK, Brazil, and the EU's Infrastructure for Spatial Information in the European Community project exemplify three different link generation approaches, respectively:
Declarative entity-level links in LOGD should go beyond links to DBpedia, and there are interesting efforts to link OGD datasets by geospatial features in the UK and to link OGD datasets to social Web data. 3
In order to track entities, researchers in the LOGD community have investigated meaningful URI naming schemas to facilitate the reuse of entity URIs and address the frequent changes in organograms and the derived lines of order. The report from Canada identifies requirements for preserving and analyzing the provenance metadata of LOGD. "Parallel Identities for Managing Open Government Data" proposes partial solutions to these requirements that leverage library science theory and the consensus from the World Wide Web Consortium's (W3C) provenance standardization efforts.
Although central dataset catalogs provide entry points for users, it's also important to reach consensus on dataset metadata, according to the lessons learned from Data.gov.uk. At the moment, vocabulary standardization is driven by the W3C Linked Government Data Working Group's project on extending the Digital Enterprise Research Institute's Data Catalog Vocabulary to a standard dataset catalog vocabulary, 4 and by the EU Interoperability Solutions for European Public Administration program's Asset Description Metadata Schema for describing semantic assets such as vocabularies, metadata, taxonomies, and code lists. These endeavors also need user-friendly visual interfaces, such as the UK's Geometric Rich Data Interface browser, which effectively enhances the user experience in accessing LOGD datasets.
Several country reports discuss collaboration, communities, links with society, and the international perspective of LOGD. International collaborations between policymakers are growing; for example, the Open Government Partnership initiative currently has 35 participating countries and more than 20 countries ready to join. The UK, US, and Brazilian experiences with close collaboration between governments and the research and academic communities can serve as models for other countries that want to take the step from open to linked government data. The US also offers an interesting example in which the dataset catalog becomes the central point of reference for communities of interest formed to research, study, and exploit the available data. This approach shows a natural evolution of Data.gov from a simple repository of datasets to a live ecosystem of stakeholders coming together to discuss and share requirements and solutions based on real data.
On the basis of the current status of LOGD, we envision three grand challenges closely associated with the three stages (open, link, and reuse) of LOGD processing:
While it may still take considerable time and community effort to fulfill the three challenges, the open and incremental nature of the LOGD ecosystem has already stimulated a positive feedback loop. According to the country reports, an increasing number of political regions are opening up their government data, more and more techniques have been learned and reused to link and mash up data, and applications has been built by nongovernment entities to integrate government data and deliver it to citizens.