Government must lead the way on structured data

The Senate Homeland Security and Governmental Affairs Committee will vote Wednesday on a bill to open up the government's spending data.

The bipartisan Digital Accountability and Transparency Act, or DATA Act, directs the Treasury Department to transform the government's tangled web of financial, grant, and procurement reports from disconnected documents into structured data, then publish the whole corpus online.

Structured data will reveal federal spending in checkbook detail, searchable by recipient, program, and appropriation.

ADVERTISEMENT
The DATA Act stands to deliver transparency to citizens, empower inspectors general to enlist Big Data analytics in their fight against waste and fraud, and slash compliance costs for grantees and contractors by automating the reports they must submit.

But Congress needs to avoid a mistake that the Securities and Exchange Commission made when it attempted a similar transformation for corporate financial disclosures nearly five years ago.

In 2009, the SEC began requiring public companies to submit copies of their financial statements in a structured data format, alongside the existing paper and PDF versions.

This machine-readable data was expected to make corporate disclosures more transparent for investors and allow the SEC to fight accounting fraud with analytics. Automatic compliance tools would ease the reporting burden for companies in much the same way that individual taxpayers now use software to file their returns.

Yet, the SEC's good intentions remain unrealized.

A review by the Columbia Business School found that most investors don't use the structured data financial statements, but still rely on the document-based versions. Many companies are drafting the plain-text financial statements first, then paying vendors to translate them into structured data. For these companies, compliance has become more expensive, not less.

So what was the SEC's mistake?

After announcing its transformation from documents into structured data, the SEC didn't actually start using that data internally.

Agency staff are still using pencils and calculators to check the math of financial statements manually, rather than using structured data to check it automatically. The agency does intend to analyze the data for indicators of accounting fraud, but after nearly five years is still building the necessary data platform.

Since the SEC hasn't started using the structured data internally, there has been no incentive to make sure it's accurate. Errors in structured data financial statements filed by companies--and discrepancies between the data versions and the plain text versions--are rampant.

Investors aren't finding the data useful, the SEC's internal workflow isn't any better, and compliance hasn't gotten cheaper.

When I drafted the original DATA Act as a House Oversight Committee staffer in 2011, I hoped to avoid the SEC's mistake. Fortunately, the 2009 federal stimulus showed how the government should use structured spending data internally.

The Recovery Accountability and Transparency Board's data analytics platform delivers structured stimulus spending data to inspectors general, who use it to hunt for waste and fraud. Over four and a half years, the Recovery Board's platform has saved taxpayers $100 million in funds recovered or withheld from questionable grantees and contractors. Because inspectors general are scrutinizing stimulus data, there's an internal incentive to correct errors.

To avoid the SEC's mistake, the DATA Act expands this existing data analytics platform to cover the entire government. If inspectors general are actively using federal spending data, they'll know when it's not accurate. They'll push agencies to get it right.

Some senators are considering removing these provisions from the DATA Act, for a short-sighted reason. They're citing the Recovery Board's $20 million annual cost while ignoring its considerable returns in fraud illuminated and eliminated.

Open data is no good unless it's accurate. The SEC's experience shows that the only way to generate internal pressure for accurate spending data will be if the federal government is actively using that data.

The DATA Act has champions in both chambers and both parties. Lead authors Rep. Darrell Issa (R-Calif.) and Sen. Mark Warner (D-Va.) are joined by Rep. Elijah Cummings (D-Md.), Sen. Rob Portman (R-Ohio), and many others.

Our tech industry coalition represents the companies whose software will use the DATA Act's standardized data to deliver transparency, fight waste and fraud, and automate compliance.

We hope the Senate will learn from the SEC's mistake and preserve the Recovery Board's role in the DATA Act.

Hollister is executive director of the Data Transparency Coalition.