New International Open Data Standards Directory Launched by GovEx and Geothink Partnership

Geothink and the Center for Government Excellence (GovEx) at Johns Hopkins University launched a first-of-its-kind Open Data Standards Directory today that identifies and assembles standards for open data shared by governments.

By Sam Lumley

Geothink and the Center for Government Excellence (GovEx) at Johns Hopkins University launched a first-of-its-kind Open Data Standards Directory today that identifies and assembles standards for open data shared by governments. The new directory provides guidance on the best format for sharing specific types of data to ensure its interoperability across local, regional and national jurisdictions.

The site began as a Geothink project led by McGill University student Rachel Bloom and was supervised by Geothink Head Renee Sieber, an associate professor in McGill University’s Department of Geography and School of Environment. For her undergraduate honors research in the Department of Geography, Bloom developed a tool for searching and querying relevant open data standard for a diverse range of municipal open data. In partnership with GovEx, Julia Conzon and Nicolas Levy as McGill undergraduate students contributed to the project via visualizing and researching the directory.

Former McGill University student Rachel Bloom initiated the Open Data Standards Directory as her undergraduate honors project.

“I think one of the biggest challenges was providing this information in a way that was easily accessible in a dashboard format,” Bloom said. “It was difficult because the standards are complex and it’s hard to capture all of the desired information about them in an easy visual style based around our users.”

“The standards directory helps people not only know what’s out there,” she added. “But based on a systematic approach, it allows people to also evaluate the standard and help them on their decision of which one to adopt. So I think that’s really valuable.”

This initiative has been further developed by The Center for Government Excellence (GovEx) at Johns Hopkins University in partnership with Geothink and members of the open data community. It now represents the first ever international data standards directory. It helps governments provide data in formats that will most effectively support informed decision-making and the provision of services.

“There’s a serious need for coordination on how governments at all levels classify different types of open data,” Sieber said. “A collaboration with McGill University, this directory provides a comprehensive inventory of how data on transit, road construction, public facilities and more has been classified. It also allows evaluation of different standards to help guide governments in choosing the most useful ones.”

The project emphasizes a collaborative approach that opens a two-way dialogue with municipalities. This allows its creators to better understand what is valued within the decision-making process and to encourage the adoption of specific standards for how open data is released. Users around the world are able and encouraged to contribute additional information and update existing standards.

“Open data improves the lives of hundreds of millions of people, many incrementally and some dramatically,” Andrew Nicklin, GovEx Director of Data Practices, said. “Our new directory will encourage global standards for how data is organized for more effective production and consumption at scale. This will insure an even greater impact on the local government services level.”

Historically, city governments and others have faced several challenges in dealing with open data sets. Among these challenges is a lack of agreement and coordination on how data sets should be structured to best serve the public that are intended to be able to access them. The establishment and organisation of common standards can address this problem by encouraging practices that ensure data is accessible and usable by citizens. It can also ensure that datasets released by differing municipalities will be interoperable.

“The directory’s inventory helps simplify and demystify choices for governments and citizens by answering the question ‘what’s out there?’ but also takes it a step further by assessing the value of these standards to a city’s data provision,” said Jean-Noé Landry, Executive Director of OpenNorth, a Geothink partner in this work. “The directory allows us to align data practices, join up data, and enable emergent data uses. Data interoperability is one key to unlocking open data’s innovation potential and we believe this inventory is a very important step towards it.”

Currently there are over 60 standards on the directory from around the world and in multiple languages. GovEx hope to expand these efforts to continually broaden its range of standards, languages and user-bases.

To find out more about the open data standards directory project, you can listen to Geothink’s podcast on the initial project, catch an update on GovEx’s latest Datapoints podcast or visit the GovEx Beta Data Standards Directory website.


Geothoughts 9: Geothink Project Measures Open Data Standards for Consumer and Publisher Uses

Geothink’s Open Data Standards Project helps publishers and consumers better use open data.

By Drew Bush

We’re very excited to present you with our ninth episode of Geothoughts. You can also subscribe to this Podcast by finding it on iTunes.

In this episode, we examine a Geothink project on open data that officially kicked off in February 2015 with a Geothink teleconference call. Project lead Rachel Bloom, an undergraduate student in the Geothink Rapid Response Think Tank at McGill University, began this research one year ago. She worked with Geothink Head Renee Sieber, associate professor in McGill University’s Department of Geography and School of Environment.

It recently culminated in a white paper written on two spread sheets (1) an examination of high-value open datasets Canadian cities use; And (2) an inventory of open data standards published by open data providers. Listen in as Bloom explains to partners who publish open data how to know what standards exist and who uses them for which datasets.

Thanks for tuning in. And we hope you subscribe with us at Geothoughts on iTunes. A transcript of this original audio podcast follows.


Welcome to Geothoughts. I’m Drew Bush.

“This project is about investigating open data domain specific standards at the Canadian municipal level, which I guess is kind of a mouthful. But basically I’ve created two spreadsheets to figure out how Canadian municipalities are publishing their data and how the level of conformity is per the guidelines for open data standards.”

That’s Rachel Bloom, an undergraduate student in Geothink’s Rapid Response Think Tank at McGill University, talking about domain specific data from sectors like transportation and city budgets. She’s working with Geothink Head Renee Sieber, associate professor in McGill University’s Department of Geography and School of Environment.

“To begin this project I chose ten domains to focus on. These domains came from open knowledge foundation spreadsheets. They are considered high value, and I thought these were interesting. I thought they were important to public use. So I chose them as the basis to create these spreadsheets.”

In late February, Bloom conducted a teleconference for the project’s partners in several Canadian cities. In it, Bloom discusses the project, each spreadsheet, and answers questions from those on the call. She starts with the first spreadsheet.

“It’s called ‘Adoption of Open Data Standards By Cities.’ So what we did for this is we have the 10 domains on the side on the y-axis, and then we have kind of nested between these certain metrics of how the municipality names the dataset, the file format, the structuration of the data, any metadata associated with the dataset or description of the data, and if theses data sets for each domain are already using specific data standards—open data standards. And these were taken from each municipality’s open data catalogues.”

“And it helped for eventually comparing whether the ways that data is being published is even kind of compatible with the semantic and schematic guidelines dictated by available open data standards.”

Participants then examined a specific example from the spreadsheet, building permits for the City of Toronto. The call then proceeded to the next spreadsheet developed.

“It’s called ‘Inventory and Evaluation of Open Data Standards.’ Here we have on the y-axis these are individual open data standards that are kind of domain specific so they are pegged to certain domains and they cover the ten domains used for the other table. Though there is two extra domains…the metrics you find kind of on the top, are an innovation on my part. They were chosen by me based on the demand of data publishers and consumers I found in my research which came from all different types of mediums.

“I’ve even read e-mail correspondences of people talking about what they want when they are structuring their datasets. They also come from reinforcing that these standards are open. So what does it mean to be open? They have to be open, they have to be consensus driven, they have to have to multi-stakeholder participation so theirs metrics have to account for that.”

Bloom again takes participants through a specific example, this time a budget data package, going through all the metrics to give participants a sense of the quality of standard in terms of making data interoperable. When she finishes, Linda Low, Open data lead for the City of Vancouver, interrupts her to ask:

“Rachel can you talk a little about the criteria for whether or not it’s open or not again, it’s whether multi-stakeholders contribute to it, and there was something else too, right, that you said?

“So when we talk about multi-stakeholders we’re talking about people who contribute that are from different facets of society. So the private sector, the public sector, civil societies, and also the obvious which is that open implies that there should be no royalties or fees associated with using the standard. It should be repurposable, they should be able to extend it how they wish, it should have a license that is open so that there is legal ramification for using the standard as you please. You’re right it’s not explicitly mentioned which of these kind of contribute to defining openness but all of these are good fundamental metrics for an open standard I would think.”

The teleconference proceeds as Bloom and the call’s participants discuss the spreadsheets and white paper, stopping to elaborate on specific examples or details in more depth. Toward the end of the 40-minute call, Bloom shares the vision and goals for this project.

“There’s metrics that can help publishers, but there’s also metrics that can help consumers who would want to voice how they want to structure the data which is really part of the open process. So I think it can be used as multiple, for multiple purposes, really so it’s flexible in that way. So I’m not sure if there’s a very specific way of using it cause it really depends on the goals of the person using the resource.”

She’s followed-up by Sieber who firsts asks a question and then provides insight into how the project’s goals were determined.

“A standard is likely to be viewed much differently if you want to do something for internal government use like business intelligence as oppose to external use. And depending upon the audience, if you’re doing something for realtors it might be viewed quite differently than if you’re trying to do it for, I don’t know, low information voters.”

At the conclusion, Low offers the municipal publishers perspective on how constantly updated and revised standards make it hard to know which one a municipality should adopt in differing domains such as city budgets, crime statistics, or waste removal services.

“When do we say this is justifiable for us without doing a whole bunch of research and wasting the effort afterward. That was the thing I always keep struggling about.”

Bloom doesn’t hesitate with an answer.

“There are so many options too and ways of approaching it. I mean, I don’t know–it’s really about the interests of the person who is publishing the data and the goals. I think at the end of the day, it’s going to, different governments are going to have reconcile what their goals are and how they want to go about it. Which is the hardest part.”

This project is ongoing and next steps will continue to look at the landscape of open data standards in Canada.

Inside Geothink’s Open Data Standards Project: Standards For Improving City Governance

By Rachel Bloom

Rachel Bloom is a McGill University undergraduate student and project lead for Geothink’s Open Data Standards Project.

In February, I led a Geothink seminar with city officials to introduce the results of our open data standards project we began approximately one year earlier. The project was started with the objective of assisting municipal publishers of open data in standardizing their datasets. We presented two spreadsheets: the first was dedicated to evaluating ‘high-value’ open datasets published by Canadian municipalities and the second consisted of an inventory of open data standards applicable to these types of datasets.

Both spreadsheets enable our partners who publish open data to know what standards exist and who uses them for which datasets. The project I lead is motivated by the idea that well-developed data standards for city governance can grant us the luxury of not having to think about the compatibility of technological components. When we screw in a new light bulb or open a web document we assume that it will work with the components we have (Guidoin and McKinney 2012). Technology, whether it refers to information systems or manufactured goods, relies on standards to ensure its usability and dissemination.

Municipal governments that publish open data look to the importance of standards for improving the usability of their data. Unfortunately, even though ‘high-value’ datasets have increasingly become available to the public, these datasets currently lack a consensus about how they should be structured and specified. Such datasets include crime statistics and annual budget data that can provide new services to citizens when municipalities open such datasets by publishing them to their open data catalogues online. Anyone can access such datasets and use the data however they wish without restriction.

Civic data standards provide agreements about semantic and schematic guidelines for structuring and encoding the data. Data standards specify technical data elements such as file formats, data schemas, and unique identifiers to make civic data interoperable. For example, most datasets are published in CSV or XML formats. CSV structures the data in columns and rows, while XML encapsulates the data in a hierarchical tree of <tags>.

They also specify common vocabularies in order to clarify interpretation of the data’s meanings. Such vocabularies could include, for example, definitions for categories of expenditure in annual budget data. Geothink’s Open Data Standards Project offers publishers of open data an opportunity to improve the usability and efficiency of their data for consumers. This makes it easier to share data across municipalities because the technological components and their meanings within systems will be compatible.

Introducing Geothink’s Open Data Standards Project
No single, clear definition of an open data standard exists. In fact, most definitions of an ‘open data standard’ follow two prevailing ideas: 1) Standards for open data; 2) And, open standards for data. Geothink’s project examines and relates together both of these prevailing ideas (Table 1). The first spreadsheet, the ‘Adoption of Open Data Standards By Cities’, considers open data and its associated data standards. The second spreadsheet, the ‘Inventory of Open Data Standards,’ considers the process of open standardization. In other words, we were curious about what standards are currently being applied to open municipal data, and how to break down and document open standards for data in a way that is useful to municipalities looking to standardize their open data.

Table 1: Differences between ‘open data’ standards and open ‘data standards’

Requires open data Requires open standard process
Evaluation of ‘High-Value’ Datasets Yes No
Inventory of Open Data Standards No Yes

The project’s evaluation of datasets relates to standards for open data. Standards for open data refer to standards that, regardless of how they are developed and maintained, can be applied to open data. Open data, according to the Open Knowledge Foundation (2014), consists of raw digital data that should be freely available to anyone to use, repurposable and re-publishable as users wish, and absent mechanisms of control like restrictive licenses. However, the process of developing and maintaining standards for open data may not require transparency nor include public appeals for its development.

To discover what civic data standards are currently being used, the first spreadsheet, Adoption of Open Data Standards By Cities, evaluates ‘high value’ datasets specific to 10 domains (categories of datasets such as crime, transportation or or service requests) in the open data catalogues for the cities of Vancouver, Toronto, Surrey, Edmonton and Ottawa. The types of data were chosen based on the Open Knowledge Foundation’s choice of datasets considered to provide greatest utility for the public. The project’s spreadsheet notes salient structuring and vocabulary of each dataset; such as the name, file format, schema, and available metadata. It especially notes which data standards these five municipalities are using for their open data (if any at all).

With consultation from municipal bodies and organizations dedicated to publishing open data, we developed a second spreadsheet, Inventory and Evaluation of Open Data Standards,  that catalogues and evaluates 22 open data standards that are available for domain-specific data. The rows of this spreadsheet indicate individual data standards. The columns of this spreadsheet evaluate background information and quality for achieving optimal interoperability for each of the listed standards. Evaluating the quality of the standard’s performance, such as whether the standard is transferable to multiple jurisdictions, is an important consideration for municipalities looking to optimally standardize their data. Examples of open data standards in this inventory are BLDS for building permit data and the Budget Data Package for annual budget data.

The project’s second spreadsheet is concerned with open standards for data. Open standards, as opposed to closed standards, requires a collaborative, transparent, and consensus-driven process to maintain its development (Palfrey and Gasser, 2012). Therefore, open standards honor a commitment to processes of transparency, due process, and rights of appeal. Similarly to open data, open standards resist processes of unchecked, centralized control (Russell, 2014) . Open data standards make sure that end users do not get locked into a specific technology. In addition, because open standards are driven by consensus, they are developed according to the needs and interests of participatory stakeholders. While we provide spreadsheets on both, our project advocates implementing open standards for open data.

In light of the benefits of open standardization, the metrics of the second spreadsheet note the degree of openness for each standard. Such indicators of openness include multi-stakeholder participation and a consensus-driven process. Openness may be observed through the presence of online forums to discuss suggestions or concerns regarding the standard’s development and background information about each standard’s publishers. In addition, open standards use open licenses that dictate the standards may be used without restriction and repurposable for any use. Providing this information not only allows potential implementers to be aware of what domain-specific standards exist, but also allows them to gauge how well the standard performs in terms of optimal interoperability and openness.

Finally, an accompanying white paper explains the two spreadsheets and the primary objective of my project for both publishers and consumers of open data. In particular, it explains the methodology, justifies chosen evaluations, and notes the project’s results.  In addition, this paper will aid in navigating and understanding both of the project’s spreadsheets.

Findings from this Project
My work on this project has led me to conclude that the majority of municipally published open datasets surveyed do not use civic data standards. The most common standard used by municipalities in our survey was the General Transit Feed Specification (GTFS) for transit data and the Open311 API for service request data. Because datasets across cities and sectors vary formats and structure, differences in them coupled with a lack of cohesive definitions for labeling indicate standardization across cities will be a challenging undertaking. Publishers aiming to extend data shared among municipalities would benefit from collaborating and agreeing on standards for domain-specific data (as is the case with GTFS).

Our evaluation of 22 domain-specific data standards also shows standards do exist across a variety of domains. However, some domains, such as budget data, contain more open data standards than others. Therefore, potential implementers of standards must reconcile which domain-specific standard best fits their objectives in publishing the data and providing the most benefits for public good.

Many of standards also contain information for contacting the standard’s publishers along with online forums for concerns or suggestions. However, many still full information regarding their documentation or are simply in early draft stages. This means that although standards exist, some of these standards are in their early stages and may not be ready for implementation.

Future Research Pathways
This project has room for growth so that we can better our partners who publish and use open data decide how to go about adopting standards. To accomplish this goal, we could add more cities, domains, and open standards to the spreadsheets. In addition, any changes made to standards or datasets in the future must be updated.

In terms of the inventory of open data standards, it might be beneficial to separate metrics that evaluate openness of a standard from metrics that evaluate interoperability of a standard. Although we have emphasized the benefits of open standardization in this project, it is evident that some publishers of data do not perceive openness as crucial for the successfulness of a data standard in achieving optimal interoperability.

As a result, my project does not aim to dictate how governments implement data standards. Instead, we would like to work with municipalities to understand what is valued within the decision-making process to encourage adoption of specific standards. We hope this will allow us to provide guidance on such policy decisions. Most importantly, to complete such work, we ask Geothink’s municipal partners for input on factors that influence the adoption of a data standard in their own catalogues.

