Opening up government data with Adobe, Google, Scribd

During yesterday's Kojo Nnamdi show on NPR, Johnson said he prefers open-source standards like HTML and XML so data can be published in its raw form "to give the public the opportunity to mash it up or interact with the data."

Government transparency has been a popular topic of conversation in the year since President Obama took office. It's a cause the White House says it backs, and has rolled out projects like to display data about stimulus projects. Federal Chief Information Officer Vivek Kundra has made public government data available on so third-party developers can use it to create new applications for citizens.

Rob Pinkerton, Adobe's director of government solutions, was also on the radio show and said formats like PDF are easy for non-programmers to publish data to the Web, which is the ultimate goal. Dave Watts, chief technology officer of Fig Leaf Software, which helps agencies with data-sharing projects, said on the show that agencies have very small budgets and expertise in data collection and distribution.


"I don't think it's the government's job to make a marketplace for programmers to write solutions with that data," Watts said. "It's their job to make sure data is available to consumers directly," and that could be in the form of PDFs.

To be sure, digitizing public data and putting information in readable forms on the Web are still relatively new notions for government agencies. They have decades worth of information that is time-consuming and expensive to transfer to digital form. And making sure it is machine-readable is not the highest priority. 

There are other options for publishing data online. Scribd, a relatively young San Francisco company, is available on, which means any government agency can use it. The tool turns PDF and Microsoft Word, Excel and PowerPoint files into online documemts that can be embedded into other Web sites. Readers can cut, paste and leave comments.

The Federal Communications Commission, for example, uses Scribd to post documents online, said Tammy Nam, Scribd's vice president of content and marketing. Sen. Mark WarnerMark Robert WarnerKey House Dem's objections stall intel bill as deadline looms Russia docs order sets Trump on collision with intel community Hillicon Valley: North Korean IT firm hit with sanctions | Zuckerberg says Facebook better prepared for midterms | Big win for privacy advocates in Europe | Bezos launches B fund to help children, homeless MORE (D-Va.) and the European Commission are also users.  

Then there's Google, which launched a tool in April to make statistics from the Bureau of Labor Statistics and Census Bureau available to searchers. For example, if you searched for unemployment rates in Virginia, a graph would appear at the top of the results. Here's a story I wrote about this for the Washington Post.

Yesterday, Google expanded the amount of data available to include Data from the World Bank. 

Ola Rosling, a product manager on Google's search team, said he shares Johnson's frustrations with PDF files and laments the lack of raw data on the Web.

He said it's best if agencies do not publish data in PDF or Word files, which require proprietary software. Instead, a simple text file is easier to search, parse and visualize.

"Any third-party would benefit" from the data, he said.