By Kim Hart - 11/12/09 02:00 PM EST
During yesterday's Kojo Nnamdi show on NPR, Johnson said he prefers open-source standards like HTML and XML so data can be published in its raw form "to give the public the opportunity to mash it up or interact with the data."
Government transparency has been a popular topic of conversation in the year since President Obama took office. It's a cause the White House says it backs, and has rolled out projects like Recovery.gov to display data about stimulus projects. Federal Chief Information Officer Vivek Kundra has made public government data available on Data.gov so third-party developers can use it to create new applications for citizens.
Rob Pinkerton, Adobe's director of government solutions, was also on the radio show and said formats like PDF are easy for non-programmers to publish data to the Web, which is the ultimate goal. Dave Watts, chief technology officer of Fig Leaf Software, which helps agencies with data-sharing projects, said on the show that agencies have very small budgets and expertise in data collection and distribution.
To be sure, digitizing public data and putting information in readable forms on the Web are still relatively new notions for government agencies. They have decades worth of information that is time-consuming and expensive to transfer to digital form. And making sure it is machine-readable is not the highest priority.
There are other options for publishing data online. Scribd, a relatively young San Francisco company, is available on Apps.gov, which means any government agency can use it. The tool turns PDF and Microsoft Word, Excel and PowerPoint files into online documemts that can be embedded into other Web sites. Readers can cut, paste and leave comments.
The Federal Communications Commission, for example, uses Scribd to post documents online, said Tammy Nam, Scribd's vice president of content and marketing. Sen. Mark Warner (D-Va.) and the European Commission are also users.
Then there's Google, which launched a tool in April to make statistics from the Bureau of Labor Statistics and Census Bureau available to searchers. For example, if you searched for unemployment rates in Virginia, a graph would appear at the top of the results. Here's a story I wrote about this for the Washington Post.
Yesterday, Google expanded the amount of data available to include Data from the World Bank.
Ola Rosling, a product manager on Google's search team, said he shares Johnson's frustrations with PDF files and laments the lack of raw data on the Web.
He said it's best if agencies do not publish data in PDF or Word files, which require proprietary software. Instead, a simple text file is easier to search, parse and visualize.
"Any third-party would benefit" from the data, he said.