Technology Options
This section provides guidance on the selection and implementation of various technologies used to develop Open Data platforms, with a particular focus on Open Data catalogs, which are the web-based systems used to make data available to end users. It is intended to support IT specialists who play a lead or coordinating role in managing the technical infrastructure of an Open Data initiative.
The terms “catalog,” “platform” and “portal” are often somewhat ambiguous and sometimes confusing. This Toolkit defines theses terms as follows:
- A data catalog is a list of datasets available in an Open Data initiative. Essential elements of a data catalog include searching, metadata, clear license information and access to the datasets themselves. Typically, a data catalog is the online centerpiece of an Open Data initiative.
- A platform provides an online “front door” for users to access all resources available under an Open Data initiative. A platform includes the data catalog along with other information and services that are part of the Open Data ecosystem. These typically include an online forum for questions, technical support and feedback; a knowledge base of background and training materials; and a blog for communications and outreach. The services within a platform are often implemented with a suite of technologies, not a single one.
- A portal can mean many different things; for that reason, this Toolkit avoids use of this term.
What does an Open Data Catalog Look Like?
As described in the following paragraphs, data catalogs can be relatively simple and “stand alone,” or very sophisticated and integrated with other systems. Most Open Data catalogs, however, share a few common characteristics (more extensive lists are also available):
Easy access. Open Data catalogs make it very easy for users to access data quickly, freely and intuitively. Access to Open Data catalogs requires no registration or login, since such requirements would discourage exploration and use.
Search. Open Data catalogs make data easy to find. Most data catalogs sort data by subject, organization or type, and support full text searching of catalog contents. Many Open Data catalogs implement search engine optimization to expose data to conventional search engines.
Machine-readable data access. Data are available for download in machine-readable, non-proprietary electronic formats. To the extent possible, the preference is to have all data in a dataset available as a single download file.
Metadata. Key metadata, such as publication date and attribution, are prominently displayed for each dataset. Many Open Data catalogs implement the Dublin Core metadata standard and make the metadata available in machine-readable formats.
Clear data licenses. Data licenses are clearly and prominently displayed for each dataset. If data are licensed under Creative Commons, the Open Data License or other standards, transparent links to these licenses are included.
Data preview/visualization. Many Open Data catalogs include some facility to preview the data prior to download or visualize the data using built-in graphing or mapping tools.
Standards compliance. Most Open Data catalogs have built-in support for various standards, such as data formats (e.g., CSV, XML, JSON) and metadata (i.e., Dublin Core). Open Data catalogs typically make each dataset available as a unique and permanent URL, which makes it possible to cite and link to the data directly.
Application Programming Interface (API). APIs allow software developers to access the Open Data catalog – and often the data itself – through software. APIs facilitate data discovery, analysis, catalog integration, harvesting of metadata from external sites and a host of applications.
Security. Open Data catalogs implement security measures to protect data and metadata from being changed by unauthorized users.
Open Data catalogs generally follow one of two service delivery models. Open Source catalogs are nominally “free,” in that they may be acquired via download for no cost, and may be modified or customized without restriction or licensing fees. These products can be hosted on the owners’ own dedicated servers or on cloud-based infrastructure, but both approaches require the catalog operator to manage IT logistics. Some vendors provide cloud-hosting of open source products as a service. In contrast, Software as a Service (SaaS) products are available from various vendors for a monthly or annual fee, and vendors assume responsibility for IT management, security and software updates. SaaS vendors may also provide some measure of customization.
Three Models of an Open Data Catalog
The three models below present one way of thinking about an Open Data catalog system. The intent here is to show how various elements and services relate to each other, and how the system changes at different scales.
Click a graphic below to view it at full size.