Tag Archives: open data standards

Geothink Newsletter Issue 12

Issue 12 of the Geothink Newsletter has been released!

Download Geothink Newsletter Issue 12.

Inside this fall’s edition we celebrate the transition to the ultimate year of the Geothink partnership research grant.

We also bring updates on recent Geothink research, including the announcement of Geothink Student Shelley Cook as the awardee of the Dr. Alexander Aylett Scholarship in Urban Sustainability and Innovation.

If you have feedback or content for the newsletter, please contact the Editor, Sam Lumley.

 

New International Open Data Standards Directory Launched by GovEx and Geothink Partnership

Geothink and the Center for Government Excellence (GovEx) at Johns Hopkins University launched a first-of-its-kind Open Data Standards Directory today that identifies and assembles standards for open data shared by governments.

By Sam Lumley

Geothink and the Center for Government Excellence (GovEx) at Johns Hopkins University launched a first-of-its-kind Open Data Standards Directory today that identifies and assembles standards for open data shared by governments. The new directory provides guidance on the best format for sharing specific types of data to ensure its interoperability across local, regional and national jurisdictions.

The site began as a Geothink project led by McGill University student Rachel Bloom and was supervised by Geothink Head Renee Sieber, an associate professor in McGill University’s Department of Geography and School of Environment. For her undergraduate honors research in the Department of Geography, Bloom developed a tool for searching and querying relevant open data standard for a diverse range of municipal open data. In partnership with GovEx, Julia Conzon and Nicolas Levy as McGill undergraduate students contributed to the project via visualizing and researching the directory.

Former McGill University student Rachel Bloom initiated the Open Data Standards Directory as her undergraduate honors project.

“I think one of the biggest challenges was providing this information in a way that was easily accessible in a dashboard format,” Bloom said. “It was difficult because the standards are complex and it’s hard to capture all of the desired information about them in an easy visual style based around our users.”

“The standards directory helps people not only know what’s out there,” she added. “But based on a systematic approach, it allows people to also evaluate the standard and help them on their decision of which one to adopt. So I think that’s really valuable.”

This initiative has been further developed by The Center for Government Excellence (GovEx) at Johns Hopkins University in partnership with Geothink and members of the open data community. It now represents the first ever international data standards directory. It helps governments provide data in formats that will most effectively support informed decision-making and the provision of services.

“There’s a serious need for coordination on how governments at all levels classify different types of open data,” Sieber said. “A collaboration with McGill University, this directory provides a comprehensive inventory of how data on transit, road construction, public facilities and more has been classified. It also allows evaluation of different standards to help guide governments in choosing the most useful ones.”

The project emphasizes a collaborative approach that opens a two-way dialogue with municipalities. This allows its creators to better understand what is valued within the decision-making process and to encourage the adoption of specific standards for how open data is released. Users around the world are able and encouraged to contribute additional information and update existing standards.

“Open data improves the lives of hundreds of millions of people, many incrementally and some dramatically,” Andrew Nicklin, GovEx Director of Data Practices, said. “Our new directory will encourage global standards for how data is organized for more effective production and consumption at scale. This will insure an even greater impact on the local government services level.”

Historically, city governments and others have faced several challenges in dealing with open data sets. Among these challenges is a lack of agreement and coordination on how data sets should be structured to best serve the public that are intended to be able to access them. The establishment and organisation of common standards can address this problem by encouraging practices that ensure data is accessible and usable by citizens. It can also ensure that datasets released by differing municipalities will be interoperable.

“The directory’s inventory helps simplify and demystify choices for governments and citizens by answering the question ‘what’s out there?’ but also takes it a step further by assessing the value of these standards to a city’s data provision,” said Jean-Noé Landry, Executive Director of OpenNorth, a Geothink partner in this work. “The directory allows us to align data practices, join up data, and enable emergent data uses. Data interoperability is one key to unlocking open data’s innovation potential and we believe this inventory is a very important step towards it.”

Currently there are over 60 standards on the directory from around the world and in multiple languages. GovEx hope to expand these efforts to continually broaden its range of standards, languages and user-bases.

To find out more about the open data standards directory project, you can listen to Geothink’s podcast on the initial project, catch an update on GovEx’s latest Datapoints podcast or visit the GovEx Beta Data Standards Directory website.

###

If you have thoughts or questions about the article, get in touch with Sam Lumley, Geothink’s newsletter editor, at sam.lumley@mail.mcgill.ca.

Inside Geothink’s Open Data Standards Project: Standards For Improving City Governance

By Rachel Bloom

Rachel Bloom is a McGill University undergraduate student and project lead for Geothink’s Open Data Standards Project.

In February, I led a Geothink seminar with city officials to introduce the results of our open data standards project we began approximately one year earlier. The project was started with the objective of assisting municipal publishers of open data in standardizing their datasets. We presented two spreadsheets: the first was dedicated to evaluating ‘high-value’ open datasets published by Canadian municipalities and the second consisted of an inventory of open data standards applicable to these types of datasets.

Both spreadsheets enable our partners who publish open data to know what standards exist and who uses them for which datasets. The project I lead is motivated by the idea that well-developed data standards for city governance can grant us the luxury of not having to think about the compatibility of technological components. When we screw in a new light bulb or open a web document we assume that it will work with the components we have (Guidoin and McKinney 2012). Technology, whether it refers to information systems or manufactured goods, relies on standards to ensure its usability and dissemination.

Municipal governments that publish open data look to the importance of standards for improving the usability of their data. Unfortunately, even though ‘high-value’ datasets have increasingly become available to the public, these datasets currently lack a consensus about how they should be structured and specified. Such datasets include crime statistics and annual budget data that can provide new services to citizens when municipalities open such datasets by publishing them to their open data catalogues online. Anyone can access such datasets and use the data however they wish without restriction.

Civic data standards provide agreements about semantic and schematic guidelines for structuring and encoding the data. Data standards specify technical data elements such as file formats, data schemas, and unique identifiers to make civic data interoperable. For example, most datasets are published in CSV or XML formats. CSV structures the data in columns and rows, while XML encapsulates the data in a hierarchical tree of <tags>.

They also specify common vocabularies in order to clarify interpretation of the data’s meanings. Such vocabularies could include, for example, definitions for categories of expenditure in annual budget data. Geothink’s Open Data Standards Project offers publishers of open data an opportunity to improve the usability and efficiency of their data for consumers. This makes it easier to share data across municipalities because the technological components and their meanings within systems will be compatible.

Introducing Geothink’s Open Data Standards Project
No single, clear definition of an open data standard exists. In fact, most definitions of an ‘open data standard’ follow two prevailing ideas: 1) Standards for open data; 2) And, open standards for data. Geothink’s project examines and relates together both of these prevailing ideas (Table 1). The first spreadsheet, the ‘Adoption of Open Data Standards By Cities’, considers open data and its associated data standards. The second spreadsheet, the ‘Inventory of Open Data Standards,’ considers the process of open standardization. In other words, we were curious about what standards are currently being applied to open municipal data, and how to break down and document open standards for data in a way that is useful to municipalities looking to standardize their open data.

Table 1: Differences between ‘open data’ standards and open ‘data standards’

Requires open data Requires open standard process
Evaluation of ‘High-Value’ Datasets Yes No
Inventory of Open Data Standards No Yes

The project’s evaluation of datasets relates to standards for open data. Standards for open data refer to standards that, regardless of how they are developed and maintained, can be applied to open data. Open data, according to the Open Knowledge Foundation (2014), consists of raw digital data that should be freely available to anyone to use, repurposable and re-publishable as users wish, and absent mechanisms of control like restrictive licenses. However, the process of developing and maintaining standards for open data may not require transparency nor include public appeals for its development.

To discover what civic data standards are currently being used, the first spreadsheet, Adoption of Open Data Standards By Cities, evaluates ‘high value’ datasets specific to 10 domains (categories of datasets such as crime, transportation or or service requests) in the open data catalogues for the cities of Vancouver, Toronto, Surrey, Edmonton and Ottawa. The types of data were chosen based on the Open Knowledge Foundation’s choice of datasets considered to provide greatest utility for the public. The project’s spreadsheet notes salient structuring and vocabulary of each dataset; such as the name, file format, schema, and available metadata. It especially notes which data standards these five municipalities are using for their open data (if any at all).

With consultation from municipal bodies and organizations dedicated to publishing open data, we developed a second spreadsheet, Inventory and Evaluation of Open Data Standards,  that catalogues and evaluates 22 open data standards that are available for domain-specific data. The rows of this spreadsheet indicate individual data standards. The columns of this spreadsheet evaluate background information and quality for achieving optimal interoperability for each of the listed standards. Evaluating the quality of the standard’s performance, such as whether the standard is transferable to multiple jurisdictions, is an important consideration for municipalities looking to optimally standardize their data. Examples of open data standards in this inventory are BLDS for building permit data and the Budget Data Package for annual budget data.

The project’s second spreadsheet is concerned with open standards for data. Open standards, as opposed to closed standards, requires a collaborative, transparent, and consensus-driven process to maintain its development (Palfrey and Gasser, 2012). Therefore, open standards honor a commitment to processes of transparency, due process, and rights of appeal. Similarly to open data, open standards resist processes of unchecked, centralized control (Russell, 2014) . Open data standards make sure that end users do not get locked into a specific technology. In addition, because open standards are driven by consensus, they are developed according to the needs and interests of participatory stakeholders. While we provide spreadsheets on both, our project advocates implementing open standards for open data.

In light of the benefits of open standardization, the metrics of the second spreadsheet note the degree of openness for each standard. Such indicators of openness include multi-stakeholder participation and a consensus-driven process. Openness may be observed through the presence of online forums to discuss suggestions or concerns regarding the standard’s development and background information about each standard’s publishers. In addition, open standards use open licenses that dictate the standards may be used without restriction and repurposable for any use. Providing this information not only allows potential implementers to be aware of what domain-specific standards exist, but also allows them to gauge how well the standard performs in terms of optimal interoperability and openness.

Finally, an accompanying white paper explains the two spreadsheets and the primary objective of my project for both publishers and consumers of open data. In particular, it explains the methodology, justifies chosen evaluations, and notes the project’s results.  In addition, this paper will aid in navigating and understanding both of the project’s spreadsheets.

Findings from this Project
My work on this project has led me to conclude that the majority of municipally published open datasets surveyed do not use civic data standards. The most common standard used by municipalities in our survey was the General Transit Feed Specification (GTFS) for transit data and the Open311 API for service request data. Because datasets across cities and sectors vary formats and structure, differences in them coupled with a lack of cohesive definitions for labeling indicate standardization across cities will be a challenging undertaking. Publishers aiming to extend data shared among municipalities would benefit from collaborating and agreeing on standards for domain-specific data (as is the case with GTFS).

Our evaluation of 22 domain-specific data standards also shows standards do exist across a variety of domains. However, some domains, such as budget data, contain more open data standards than others. Therefore, potential implementers of standards must reconcile which domain-specific standard best fits their objectives in publishing the data and providing the most benefits for public good.

Many of standards also contain information for contacting the standard’s publishers along with online forums for concerns or suggestions. However, many still full information regarding their documentation or are simply in early draft stages. This means that although standards exist, some of these standards are in their early stages and may not be ready for implementation.

Future Research Pathways
This project has room for growth so that we can better our partners who publish and use open data decide how to go about adopting standards. To accomplish this goal, we could add more cities, domains, and open standards to the spreadsheets. In addition, any changes made to standards or datasets in the future must be updated.

In terms of the inventory of open data standards, it might be beneficial to separate metrics that evaluate openness of a standard from metrics that evaluate interoperability of a standard. Although we have emphasized the benefits of open standardization in this project, it is evident that some publishers of data do not perceive openness as crucial for the successfulness of a data standard in achieving optimal interoperability.

As a result, my project does not aim to dictate how governments implement data standards. Instead, we would like to work with municipalities to understand what is valued within the decision-making process to encourage adoption of specific standards. We hope this will allow us to provide guidance on such policy decisions. Most importantly, to complete such work, we ask Geothink’s municipal partners for input on factors that influence the adoption of a data standard in their own catalogues.

Contact Rachel Bloom at rachel.bloom@mail.mcgill.ca with comments on this article or to provide input on Geothink’s Open Data Standards Project.

References
Guidoin, Stéphane and James McKinney. 2012. Open Data, Standards and Socrata. Available at http://www.opennorth.ca/2012/11/22/open-data-standards.html. November 22, 2012.
Open Knowledge. Open Definition 2.0. Opendefinition.org. Retrieved 23 October 2015, from http://opendefinition.org/od/2.0/en/
Palfrey, John Gorham, and Urs Gasser. Interop: The promise and perils of highly interconnected systems. Basic Books, 2012.
Russell, Andrew L. Open Standards and the Digital Age. Cambridge University Press, 2014.

Bridging Differences in Open Data: Coming up with standards at Open North

Open North has quietly released two reports on open data over the past year.

By Drew Bush

In case you missed either report, over the last year Open North has quietly put out an inventory of open data globally and, in a separate report, recommended baseline international standards for open data catalogs. The first report is entitled Gaps and opportunities for standardization in OGP members’ open data catalogs while the second is entitled Identifying recommended standards and best practicesfor open data.

Their work was completed as part of the Open Government Partnership (OGP) Working Group, a group that aims to support governments seeking transparency through open data. Both reports aim to help the 69 countries in the partnership to improve their ability to share open data by standardizing how it’s made available.

The first report, which inventories open data in OGP’s member countries, notes that most members’ open data initiatives consist largely of open data catalogues. To assess each of these different catalogues, the authors wrote automated scripts to collect, normalize, and analyze them. This process allowed them to set a baseline across countries and identify gaps and opportuni­ties for standardization.

“The analysis simply states the choices that OGP members have made with respect each area for standardization; it makes no judgment as to whether these choices are best practices,” they write in laying out the objectives for the report.

In the second report, the authors address a specific research question: “What baseline standards and best practices for open data should OGP members adopt?” But first they diagnose the problem open data faces globally without any standards.

“The lack of standardization across ju­risdictions is one major barrier; it makes discovering, accessing, using, and integrating data cumbersome and expensive, above the expected return,” they write. “A lack of knowledge about existing standards and a lack of guidance for their adoption and implementation contribute to this situation.”

The majority of the report then seeks to address these problems by outlining baseline standards and best practices for open data catalogs, while taking into account the dif­ferences between jurisdictions that make the global adoption and implementation of standards challenging. In particular, the report concludes with 33 recommendations that member countries should undertake including that governments should provide their agencies a list of acceptable data formats or that they should avoid file compression without good support for it.

To find more of our previous coverage about Open North’s work on open data, check out our previous Geothink.ca story here.

If you have thoughts or questions about this article, get in touch with Drew Bush, Geothink’s digital journalist, at drew.bush@mail.mcgill.ca.

Part 2: Our Project Head on North American Civic Participation and Geothink’s Projects

By Drew Bush

Renee Sieber, associate professor in McGill University’s Department of Geography and School of Environment.

Part 2 (of 2). This is the second in a two part series with the head of Geothink.ca, Renee Sieber, an associate professor in the Department of Geography and School of Environment at McGill University. In this second part, we pick up the story of how Sieber sees civic participation in North America during an age of technological change. Catch Part 1 here if you missed our coverage of Geothink itself; its vision, goal and design.

Talking with Renee Sieber means finding exuberance and excitement for each of Geothink.ca’s projects and the work of all the team members, collaborators and partners. One place to start such a conversation is with how many cities make information available to the public.

“Cities are also publishing enormous amounts of data—it’s called open data,” Sieber, Geothink’s head and an associate professor at McGill University, said. “And this data can be turned into applications that for example can allow citizens to more easily know when they should put their recycling out and what types of recycling [exist], where there is going to be traffic congestion or traffic construction, when the next city council meeting will be held and what will be on the city council agenda.”

This open data forms the basis for how many modern technologies use programs to simplify and facilitate citizen interactions with city garbage services, transportation networks or city policies and processes. In particular, one Geothink project aims to interrogate how standards are created for open data—no easy thing, according to Sieber, when you’re talking not just about abstract data but even more abstract metadata.

“So why should one care about that?” Sieber asked. “Well, we should care about that first of all because the reason that people can now get up-to-date transit information in cities all over North America and, indeed, cities all over the world is because of a very small open data standard called GTFS, the General Transit Feed Specification.”

This prototype successful standard (or way of structuring public transportation data) resulted from a partnership between Google and Portland, Oregon. And, according to Sieber, it’s not about visualizing the data but standardizing its structure so that it can be used in equations that allow cities to show when the next bus will arrive, the best ways to get from point x to point y, and to put all this information on a map. In fact, Open511, a standard for traffic and road construction, explicitly styles itself after this prototype.

“It’s really interesting for us to figure out what new data standards will emerge,” added Sieber. “For example, will there be one to show traffic construction all over the country or all over North America?”

Yet it marks only one way Geothink is examining citizen interactions with cities. At Ryerson University, Associate Professor Pamela Robinson is working on examining civic hackathons where cities bring together techies and interested citizens to find innovative ways to design and build applications for city data and improve city services. The problem, according to Sieber, is that after the hackathons many such applications or proofs of concepts disappear. For example, some recent winners of a hackathon in the United Kingdom felt that too many applications end up up in the back alleys of BitBucket or GitHub.

“So it can be a quite frustrating experience,” Sieber said. “And cities and the participants alike look towards ways to try to retain that enthusiasm over time and to build on the proofs of concept to actually deploy the apps. So Pamela is conducting research on how to create that technological sustainability.”

In yet one other project, Geothink has partnered with the Nova Scotia Government’s Community Counts  program located in Halifax, Nova Scotia to study the preferences of end-users from community-based management organizations or non-profits who utilize the open data from the province. Community Counts’s mission is to make it easier for such organizations to use information such as socio-demographic data, although the organization itself just lost funding in the province’s most recent budget.

“This is very different from working with apps from open data because with apps you generally know who the developers are but you don’t know who the end-users are,” Sieber said. “So we are conducting a project with them to ask questions of the end-users to find out what they find valuable or challenging in using data. And we’ll then infer that to the challenges and opportunities of working with open data that cities produce.”

So how does all this reflect on what civic participation means today in North America? Governments can now know if you visit certain parks, go to certain places for coffee, and meet certain friends while doing either. So, theoretically at least, they can now design urban spaces and cities themselves to be safer, more vibrant, and better suited to the range of activities taking place in these places.

“That seems both incredibly convenient and incredibly Orwellian at the same time,” Sieber said. To find out more about her views of civic participation, stay tuned for our next Geothoughts Podcast by signing up to receive it on iTunes.

If you have thoughts or questions about this article, get in touch with Drew Bush, Geothink’s digital journalist, at drew.bush@mail.mcgill.ca.