Tag Archives: standards

Geothoughts 9: Geothink Project Measures Open Data Standards for Consumer and Publisher Uses

Geothink's Open Data Standards Project helps publishers and consumers better use open data.

Geothink’s Open Data Standards Project helps publishers and consumers better use open data.

By Drew Bush

We’re very excited to present you with our ninth episode of Geothoughts. You can also subscribe to this Podcast by finding it on iTunes.

In this episode, we examine a Geothink project on open data that officially kicked off in February 2015 with a Geothink teleconference call. Project lead Rachel Bloom, an undergraduate student in the Geothink Rapid Response Think Tank at McGill University, began this research one year ago. She worked with Geothink Head Renee Sieber, associate professor in McGill University’s Department of Geography and School of Environment.

It recently culminated in a white paper written on two spread sheets (1) an examination of high-value open datasets Canadian cities use; And (2) an inventory of open data standards published by open data providers. Listen in as Bloom explains to partners who publish open data how to know what standards exist and who uses them for which datasets.

Thanks for tuning in. And we hope you subscribe with us at Geothoughts on iTunes. A transcript of this original audio podcast follows.

TRANSCRIPT OF AUDIO PODCAST

Welcome to Geothoughts. I’m Drew Bush.

[Geothink.ca theme music]

“This project is about investigating open data domain specific standards at the Canadian municipal level, which I guess is kind of a mouthful. But basically I’ve created two spreadsheets to figure out how Canadian municipalities are publishing their data and how the level of conformity is per the guidelines for open data standards.”

That’s Rachel Bloom, an undergraduate student in Geothink’s Rapid Response Think Tank at McGill University, talking about domain specific data from sectors like transportation and city budgets. She’s working with Geothink Head Renee Sieber, associate professor in McGill University’s Department of Geography and School of Environment.

“To begin this project I chose ten domains to focus on. These domains came from open knowledge foundation spreadsheets. They are considered high value, and I thought these were interesting. I thought they were important to public use. So I chose them as the basis to create these spreadsheets.”

In late February, Bloom conducted a teleconference for the project’s partners in several Canadian cities. In it, Bloom discusses the project, each spreadsheet, and answers questions from those on the call. She starts with the first spreadsheet.

“It’s called ‘Adoption of Open Data Standards By Cities.’ So what we did for this is we have the 10 domains on the side on the y-axis, and then we have kind of nested between these certain metrics of how the municipality names the dataset, the file format, the structuration of the data, any metadata associated with the dataset or description of the data, and if theses data sets for each domain are already using specific data standards—open data standards. And these were taken from each municipality’s open data catalogues.”

“And it helped for eventually comparing whether the ways that data is being published is even kind of compatible with the semantic and schematic guidelines dictated by available open data standards.”

Participants then examined a specific example from the spreadsheet, building permits for the City of Toronto. The call then proceeded to the next spreadsheet developed.

“It’s called ‘Inventory and Evaluation of Open Data Standards.’ Here we have on the y-axis these are individual open data standards that are kind of domain specific so they are pegged to certain domains and they cover the ten domains used for the other table. Though there is two extra domains…the metrics you find kind of on the top, are an innovation on my part. They were chosen by me based on the demand of data publishers and consumers I found in my research which came from all different types of mediums.

“I’ve even read e-mail correspondences of people talking about what they want when they are structuring their datasets. They also come from reinforcing that these standards are open. So what does it mean to be open? They have to be open, they have to be consensus driven, they have to have to multi-stakeholder participation so theirs metrics have to account for that.”

Bloom again takes participants through a specific example, this time a budget data package, going through all the metrics to give participants a sense of the quality of standard in terms of making data interoperable. When she finishes, Linda Low, Open data lead for the City of Vancouver, interrupts her to ask:

“Rachel can you talk a little about the criteria for whether or not it’s open or not again, it’s whether multi-stakeholders contribute to it, and there was something else too, right, that you said?

“So when we talk about multi-stakeholders we’re talking about people who contribute that are from different facets of society. So the private sector, the public sector, civil societies, and also the obvious which is that open implies that there should be no royalties or fees associated with using the standard. It should be repurposable, they should be able to extend it how they wish, it should have a license that is open so that there is legal ramification for using the standard as you please. You’re right it’s not explicitly mentioned which of these kind of contribute to defining openness but all of these are good fundamental metrics for an open standard I would think.”

The teleconference proceeds as Bloom and the call’s participants discuss the spreadsheets and white paper, stopping to elaborate on specific examples or details in more depth. Toward the end of the 40-minute call, Bloom shares the vision and goals for this project.

“There’s metrics that can help publishers, but there’s also metrics that can help consumers who would want to voice how they want to structure the data which is really part of the open process. So I think it can be used as multiple, for multiple purposes, really so it’s flexible in that way. So I’m not sure if there’s a very specific way of using it cause it really depends on the goals of the person using the resource.”

She’s followed-up by Sieber who firsts asks a question and then provides insight into how the project’s goals were determined.

“A standard is likely to be viewed much differently if you want to do something for internal government use like business intelligence as oppose to external use. And depending upon the audience, if you’re doing something for realtors it might be viewed quite differently than if you’re trying to do it for, I don’t know, low information voters.”

At the conclusion, Low offers the municipal publishers perspective on how constantly updated and revised standards make it hard to know which one a municipality should adopt in differing domains such as city budgets, crime statistics, or waste removal services.

“When do we say this is justifiable for us without doing a whole bunch of research and wasting the effort afterward. That was the thing I always keep struggling about.”

Bloom doesn’t hesitate with an answer.

“There are so many options too and ways of approaching it. I mean, I don’t know–it’s really about the interests of the person who is publishing the data and the goals. I think at the end of the day, it’s going to, different governments are going to have reconcile what their goals are and how they want to go about it. Which is the hardest part.”

This project is ongoing and next steps will continue to look at the landscape of open data standards in Canada.

[Voice over: Geothoughts are brought to you by Geothink.ca and generous funding from Canada’s Social Sciences and Humanities Research Council.]

###

If you have thoughts or questions about this podcast, get in touch with Drew Bush, Geothink’s digital journalist, at drew.bush@mail.mcgill.ca.

Inside Geothink’s Open Data Standards Project: Standards For Improving City Governance

By Rachel Bloom

Rachel Bloom is a McGill University undergraduate student and project lead for Geothink’s Open Data Standards Project.

In February, I led a Geothink seminar with city officials to introduce the results of our open data standards project we began approximately one year earlier. The project was started with the objective of assisting municipal publishers of open data in standardizing their datasets. We presented two spreadsheets: the first was dedicated to evaluating ‘high-value’ open datasets published by Canadian municipalities and the second consisted of an inventory of open data standards applicable to these types of datasets.

Both spreadsheets enable our partners who publish open data to know what standards exist and who uses them for which datasets. The project I lead is motivated by the idea that well-developed data standards for city governance can grant us the luxury of not having to think about the compatibility of technological components. When we screw in a new light bulb or open a web document we assume that it will work with the components we have (Guidoin and McKinney 2012). Technology, whether it refers to information systems or manufactured goods, relies on standards to ensure its usability and dissemination.

Municipal governments that publish open data look to the importance of standards for improving the usability of their data. Unfortunately, even though ‘high-value’ datasets have increasingly become available to the public, these datasets currently lack a consensus about how they should be structured and specified. Such datasets include crime statistics and annual budget data that can provide new services to citizens when municipalities open such datasets by publishing them to their open data catalogues online. Anyone can access such datasets and use the data however they wish without restriction.

Civic data standards provide agreements about semantic and schematic guidelines for structuring and encoding the data. Data standards specify technical data elements such as file formats, data schemas, and unique identifiers to make civic data interoperable. For example, most datasets are published in CSV or XML formats. CSV structures the data in columns and rows, while XML encapsulates the data in a hierarchical tree of <tags>.

They also specify common vocabularies in order to clarify interpretation of the data’s meanings. Such vocabularies could include, for example, definitions for categories of expenditure in annual budget data. Geothink’s Open Data Standards Project offers publishers of open data an opportunity to improve the usability and efficiency of their data for consumers. This makes it easier to share data across municipalities because the technological components and their meanings within systems will be compatible.

Introducing Geothink’s Open Data Standards Project
No single, clear definition of an open data standard exists. In fact, most definitions of an ‘open data standard’ follow two prevailing ideas: 1) Standards for open data; 2) And, open standards for data. Geothink’s project examines and relates together both of these prevailing ideas (Table 1). The first spreadsheet, the ‘Adoption of Open Data Standards By Cities’, considers open data and its associated data standards. The second spreadsheet, the ‘Inventory of Open Data Standards,’ considers the process of open standardization. In other words, we were curious about what standards are currently being applied to open municipal data, and how to break down and document open standards for data in a way that is useful to municipalities looking to standardize their open data.

Table 1: Differences between ‘open data’ standards and open ‘data standards’

Requires open data Requires open standard process
Evaluation of ‘High-Value’ Datasets Yes No
Inventory of Open Data Standards No Yes

The project’s evaluation of datasets relates to standards for open data. Standards for open data refer to standards that, regardless of how they are developed and maintained, can be applied to open data. Open data, according to the Open Knowledge Foundation (2014), consists of raw digital data that should be freely available to anyone to use, repurposable and re-publishable as users wish, and absent mechanisms of control like restrictive licenses. However, the process of developing and maintaining standards for open data may not require transparency nor include public appeals for its development.

To discover what civic data standards are currently being used, the first spreadsheet, Adoption of Open Data Standards By Cities, evaluates ‘high value’ datasets specific to 10 domains (categories of datasets such as crime, transportation or or service requests) in the open data catalogues for the cities of Vancouver, Toronto, Surrey, Edmonton and Ottawa. The types of data were chosen based on the Open Knowledge Foundation’s choice of datasets considered to provide greatest utility for the public. The project’s spreadsheet notes salient structuring and vocabulary of each dataset; such as the name, file format, schema, and available metadata. It especially notes which data standards these five municipalities are using for their open data (if any at all).

With consultation from municipal bodies and organizations dedicated to publishing open data, we developed a second spreadsheet, Inventory and Evaluation of Open Data Standards,  that catalogues and evaluates 22 open data standards that are available for domain-specific data. The rows of this spreadsheet indicate individual data standards. The columns of this spreadsheet evaluate background information and quality for achieving optimal interoperability for each of the listed standards. Evaluating the quality of the standard’s performance, such as whether the standard is transferable to multiple jurisdictions, is an important consideration for municipalities looking to optimally standardize their data. Examples of open data standards in this inventory are BLDS for building permit data and the Budget Data Package for annual budget data.

The project’s second spreadsheet is concerned with open standards for data. Open standards, as opposed to closed standards, requires a collaborative, transparent, and consensus-driven process to maintain its development (Palfrey and Gasser, 2012). Therefore, open standards honor a commitment to processes of transparency, due process, and rights of appeal. Similarly to open data, open standards resist processes of unchecked, centralized control (Russell, 2014) . Open data standards make sure that end users do not get locked into a specific technology. In addition, because open standards are driven by consensus, they are developed according to the needs and interests of participatory stakeholders. While we provide spreadsheets on both, our project advocates implementing open standards for open data.

In light of the benefits of open standardization, the metrics of the second spreadsheet note the degree of openness for each standard. Such indicators of openness include multi-stakeholder participation and a consensus-driven process. Openness may be observed through the presence of online forums to discuss suggestions or concerns regarding the standard’s development and background information about each standard’s publishers. In addition, open standards use open licenses that dictate the standards may be used without restriction and repurposable for any use. Providing this information not only allows potential implementers to be aware of what domain-specific standards exist, but also allows them to gauge how well the standard performs in terms of optimal interoperability and openness.

Finally, an accompanying white paper explains the two spreadsheets and the primary objective of my project for both publishers and consumers of open data. In particular, it explains the methodology, justifies chosen evaluations, and notes the project’s results.  In addition, this paper will aid in navigating and understanding both of the project’s spreadsheets.

Findings from this Project
My work on this project has led me to conclude that the majority of municipally published open datasets surveyed do not use civic data standards. The most common standard used by municipalities in our survey was the General Transit Feed Specification (GTFS) for transit data and the Open311 API for service request data. Because datasets across cities and sectors vary formats and structure, differences in them coupled with a lack of cohesive definitions for labeling indicate standardization across cities will be a challenging undertaking. Publishers aiming to extend data shared among municipalities would benefit from collaborating and agreeing on standards for domain-specific data (as is the case with GTFS).

Our evaluation of 22 domain-specific data standards also shows standards do exist across a variety of domains. However, some domains, such as budget data, contain more open data standards than others. Therefore, potential implementers of standards must reconcile which domain-specific standard best fits their objectives in publishing the data and providing the most benefits for public good.

Many of standards also contain information for contacting the standard’s publishers along with online forums for concerns or suggestions. However, many still full information regarding their documentation or are simply in early draft stages. This means that although standards exist, some of these standards are in their early stages and may not be ready for implementation.

Future Research Pathways
This project has room for growth so that we can better our partners who publish and use open data decide how to go about adopting standards. To accomplish this goal, we could add more cities, domains, and open standards to the spreadsheets. In addition, any changes made to standards or datasets in the future must be updated.

In terms of the inventory of open data standards, it might be beneficial to separate metrics that evaluate openness of a standard from metrics that evaluate interoperability of a standard. Although we have emphasized the benefits of open standardization in this project, it is evident that some publishers of data do not perceive openness as crucial for the successfulness of a data standard in achieving optimal interoperability.

As a result, my project does not aim to dictate how governments implement data standards. Instead, we would like to work with municipalities to understand what is valued within the decision-making process to encourage adoption of specific standards. We hope this will allow us to provide guidance on such policy decisions. Most importantly, to complete such work, we ask Geothink’s municipal partners for input on factors that influence the adoption of a data standard in their own catalogues.

Contact Rachel Bloom at rachel.bloom@mail.mcgill.ca with comments on this article or to provide input on Geothink’s Open Data Standards Project.

References
Guidoin, Stéphane and James McKinney. 2012. Open Data, Standards and Socrata. Available at http://www.opennorth.ca/2012/11/22/open-data-standards.html. November 22, 2012.
Open Knowledge. Open Definition 2.0. Opendefinition.org. Retrieved 23 October 2015, from http://opendefinition.org/od/2.0/en/
Palfrey, John Gorham, and Urs Gasser. Interop: The promise and perils of highly interconnected systems. Basic Books, 2012.
Russell, Andrew L. Open Standards and the Digital Age. Cambridge University Press, 2014.

Bridging Differences in Open Data: Coming up with standards at Open North

Open North has quietly released two reports on open data over the past year.

By Drew Bush

In case you missed either report, over the last year Open North has quietly put out an inventory of open data globally and, in a separate report, recommended baseline international standards for open data catalogs. The first report is entitled Gaps and opportunities for standardization in OGP members’ open data catalogs while the second is entitled Identifying recommended standards and best practicesfor open data.

Their work was completed as part of the Open Government Partnership (OGP) Working Group, a group that aims to support governments seeking transparency through open data. Both reports aim to help the 69 countries in the partnership to improve their ability to share open data by standardizing how it’s made available.

The first report, which inventories open data in OGP’s member countries, notes that most members’ open data initiatives consist largely of open data catalogues. To assess each of these different catalogues, the authors wrote automated scripts to collect, normalize, and analyze them. This process allowed them to set a baseline across countries and identify gaps and opportuni­ties for standardization.

“The analysis simply states the choices that OGP members have made with respect each area for standardization; it makes no judgment as to whether these choices are best practices,” they write in laying out the objectives for the report.

In the second report, the authors address a specific research question: “What baseline standards and best practices for open data should OGP members adopt?” But first they diagnose the problem open data faces globally without any standards.

“The lack of standardization across ju­risdictions is one major barrier; it makes discovering, accessing, using, and integrating data cumbersome and expensive, above the expected return,” they write. “A lack of knowledge about existing standards and a lack of guidance for their adoption and implementation contribute to this situation.”

The majority of the report then seeks to address these problems by outlining baseline standards and best practices for open data catalogs, while taking into account the dif­ferences between jurisdictions that make the global adoption and implementation of standards challenging. In particular, the report concludes with 33 recommendations that member countries should undertake including that governments should provide their agencies a list of acceptable data formats or that they should avoid file compression without good support for it.

To find more of our previous coverage about Open North’s work on open data, check out our previous Geothink.ca story here.

If you have thoughts or questions about this article, get in touch with Drew Bush, Geothink’s digital journalist, at drew.bush@mail.mcgill.ca.

Part 2: Our Project Head on North American Civic Participation and Geothink’s Projects

By Drew Bush

Renee Sieber, associate professor in McGill University’s Department of Geography and School of Environment.

Part 2 (of 2). This is the second in a two part series with the head of Geothink.ca, Renee Sieber, an associate professor in the Department of Geography and School of Environment at McGill University. In this second part, we pick up the story of how Sieber sees civic participation in North America during an age of technological change. Catch Part 1 here if you missed our coverage of Geothink itself; its vision, goal and design.

Talking with Renee Sieber means finding exuberance and excitement for each of Geothink.ca’s projects and the work of all the team members, collaborators and partners. One place to start such a conversation is with how many cities make information available to the public.

“Cities are also publishing enormous amounts of data—it’s called open data,” Sieber, Geothink’s head and an associate professor at McGill University, said. “And this data can be turned into applications that for example can allow citizens to more easily know when they should put their recycling out and what types of recycling [exist], where there is going to be traffic congestion or traffic construction, when the next city council meeting will be held and what will be on the city council agenda.”

This open data forms the basis for how many modern technologies use programs to simplify and facilitate citizen interactions with city garbage services, transportation networks or city policies and processes. In particular, one Geothink project aims to interrogate how standards are created for open data—no easy thing, according to Sieber, when you’re talking not just about abstract data but even more abstract metadata.

“So why should one care about that?” Sieber asked. “Well, we should care about that first of all because the reason that people can now get up-to-date transit information in cities all over North America and, indeed, cities all over the world is because of a very small open data standard called GTFS, the General Transit Feed Specification.”

This prototype successful standard (or way of structuring public transportation data) resulted from a partnership between Google and Portland, Oregon. And, according to Sieber, it’s not about visualizing the data but standardizing its structure so that it can be used in equations that allow cities to show when the next bus will arrive, the best ways to get from point x to point y, and to put all this information on a map. In fact, Open511, a standard for traffic and road construction, explicitly styles itself after this prototype.

“It’s really interesting for us to figure out what new data standards will emerge,” added Sieber. “For example, will there be one to show traffic construction all over the country or all over North America?”

Yet it marks only one way Geothink is examining citizen interactions with cities. At Ryerson University, Associate Professor Pamela Robinson is working on examining civic hackathons where cities bring together techies and interested citizens to find innovative ways to design and build applications for city data and improve city services. The problem, according to Sieber, is that after the hackathons many such applications or proofs of concepts disappear. For example, some recent winners of a hackathon in the United Kingdom felt that too many applications end up up in the back alleys of BitBucket or GitHub.

“So it can be a quite frustrating experience,” Sieber said. “And cities and the participants alike look towards ways to try to retain that enthusiasm over time and to build on the proofs of concept to actually deploy the apps. So Pamela is conducting research on how to create that technological sustainability.”

In yet one other project, Geothink has partnered with the Nova Scotia Government’s Community Counts  program located in Halifax, Nova Scotia to study the preferences of end-users from community-based management organizations or non-profits who utilize the open data from the province. Community Counts’s mission is to make it easier for such organizations to use information such as socio-demographic data, although the organization itself just lost funding in the province’s most recent budget.

“This is very different from working with apps from open data because with apps you generally know who the developers are but you don’t know who the end-users are,” Sieber said. “So we are conducting a project with them to ask questions of the end-users to find out what they find valuable or challenging in using data. And we’ll then infer that to the challenges and opportunities of working with open data that cities produce.”

So how does all this reflect on what civic participation means today in North America? Governments can now know if you visit certain parks, go to certain places for coffee, and meet certain friends while doing either. So, theoretically at least, they can now design urban spaces and cities themselves to be safer, more vibrant, and better suited to the range of activities taking place in these places.

“That seems both incredibly convenient and incredibly Orwellian at the same time,” Sieber said. To find out more about her views of civic participation, stay tuned for our next Geothoughts Podcast by signing up to receive it on iTunes.

If you have thoughts or questions about this article, get in touch with Drew Bush, Geothink’s digital journalist, at drew.bush@mail.mcgill.ca.

Open North’s Inventory: Coming Up With Standards for Open Data

Open data standards are the subject of a new OGP report (Photo courtesy of opensource.com).

Open data standards are the subject of a new OGP report (Photo courtesy of opensource.com).

By Drew Bush

Open North’s James McKinney, Stéphane Guidoin and Paulina Marczak completed an inventory of global open data standards last week that seeks to establish a global viewpoint on the subject and identify any missing pieces. Their work was completed as part of the Open Government Partnership (OGP) Working Group, a group that aims to support governments seeking transparency through open data.

“The objective…is to promote the use of open data standards to improve transparency, create social and economic value, and increase the interoperability of open data activities across multiple jurisdictions,” the authors write in their report. “Its first deliverable is to complete an inventory of open data standards by type to develop a global view and to identify gaps and overlaps. Its final deliverable is an OGP document outlining baseline standards and best practices for open data, along with guidance for adoption and implementation.”

In their report, the authors used scripts to automatically collect, normalize and analyze data from 40 OGP members’ catalogs. Their goal was to determine how to standardize the ways such data is licensed, how metadata is used, what types of file formats catalogs make use of, and the overall structure of each catalog. As they wrote, they did not seek to “pursue a comprehensive inventory of data standards” but rather to focus on those “most relevant to OGP members.”

A myriad number of findings result from their analysis. In particular, they found OGP members have no common structure to their catalogs, a need for a common vocabulary for metadata (or data about data), and that there are significant problems with the metadata used to specify licensing in some countries (with “8 out of 24 catalogs, the licenses of over 10 percent of datasets are either not specified or underspecified”).

OGP’s Working Group consists of four streams that include Principles, Measurement, Standards and Capacity Building. Each consists of leads from the government, private and nonprofit world who work to identify and share practices that help OGP governments implement commitments and develop more ambitious and innovative open data plans.

McKinney serves as the lead for the Standards theme which promotes the use of open data standards to improve transparency and to increase the interoperability of open data activities across multiple jurisdictions. His organization, the Canadian non-profit Open North, creates online tools for civil society and government to educate and empower citizens to participate actively in Canadian democracy. Open North is also a Geothink partner.

Find the report here.

If you have thoughts or questions about this article, get in touch with Drew Bush, Geothink’s digital journalist, at drew.bush@mail.mcgill.ca.