Open data experts stress the importance of developing and applying standards for data and metadata as early in the process as possible. Ideally, standards should be applied before data collection; converting to standards later on can lead to some loss of data quality. Better standardization can help improve data quality and in particular can help make different datasets interoperable. Recommended best practices include applying data standards early on and throughout the data lifecycle. Suggested best practices:
Use common data standards and taxonomies for U.S. federal data. Standards are needed to provide a basis for assessing data quality, for comparing datasets to each other to cross-validate them, and to make datasets interoperable. Different sectors in the federal government should approach standardization in the ways that are most appropriate to their data, in collaboration with academia, industry, and other outside experts. For example, the U.S. Departments of Justice, Homeland Security and Health and Human Services collaboratively created the National Information Exchange Model (NIEM) in April 2005 as a standard for data exchange. More recently, in accordance with the Digital Accountability and Transparency Act of 2014, the Office of Management and Budget (OMB) and the U.S. Department of the Treasury developed the DATA Act Information Model Schema (DAIMS) as a government-wide standard for the reporting of federal spending data.
In the absence of uniform standards, develop an additional “data layer” to enhance interoperability. In areas where uniform data standards have not yet been established, it’s possible to create a “data transformation layer” that provides meaningful information for developers to facilitate both the finding and the connecting of data within and across federal and private sector data producers and consumers. The CitySDK project, developed by the Census Bureau, is one example: It makes Census data available to developers so that they can create solutions for cities and communities using a simple open source toolkit. The CitySDK is that data transformation layer that is delivered through an Application Programming Interface (API) to help standardize the data mash-up or interoperability with a host of federal and private sector data. This kind of open source solution could be applied in many areas where different kinds of users need to discover, access and connect disparate standardized datasets.