The way data is initially collected can determine the quality and value of data throughout the lifecycle. Whether data is collected by satellites, sensors, or surveys, or through other means, quality control at the very beginning can prevent problems down the line. Suggested best practices:
Focus on quality at the time of data collection, which is more efficient and effective than quality improvement at later stages. Organizations that want to use open government data face a number of obstacles in the quality of the data. Government agencies and their data users now see the need to address timeliness, accuracy, precision, interoperability, and other factors in open data. That effort will be directed most efficiently if it focuses on data at the time of collection. Quality safeguards can include: Standardizing data collection to avoid inconsistencies in data fields; systems to cross-check and validate data against existing datasets; formal quality control processes at the time of data collection.
Eliminate manual data entry as much as possible. Human error is a major cause of poor data quality. The move towards e-filing of government forms can help ensure better data. Agencies are generally able to offer e-filing incentives to individuals and organizations that send them information. Although they may not be able to require it, incentives such as ease and speed of filing can encourage its use.
Use consumers and volunteers as data sources. Individual citizens can help agencies collect data in different ways. Crowdsourcing programs for citizen science, which invite volunteers to collect data on the natural world or other phenomena, have been endorsed and encouraged by the White House Office of Science and Technology Policy. In a very different way, individuals can contribute to valuable datasets when they notify federal agencies about unfair business practices, unsafe products, or other consumer concerns.