Everyday we turn to the Internet for the seemingly endless amount of information and entertainment it provides. We learn to build houses from online tutorials; we seek advice from other parents; we share our creations with customers throughout the world; sometimes we can even turn a road trip from Texas to DC into a nationwide conversation. We tweet, snap, post, listen, watch, and share on platforms all over the web.
But most people don’t realize that we can only do these things because of a law passed in 1998—the Digital Millennium Copyright Act (DMCA). The DMCA ensures that anyone can use Internet platforms to create content, post comments, and share ideas online as long as the platforms they use act responsibly. And the law has worked, facilitating the growth of a vibrant ecosystem of online portals that have spurred an unprecedented boom in creative production. Without the DMCA, online platforms could get sued out of existence if one of their users were to post copyrighted content amongst the millions of how-to videos and remixes.
Despite the creative and economic value the DMCA has helped create, the copyright industry is currently seeking to radically change the law to force online platforms to police for copyright infringements on their behalf by using content filtering technology.
There has been a lot of discussion about whether such a proposal is appropriate. But before lawmakers can properly determine whether requiring online portals to use filtering technology is good for Internet users, startups, and content creators, it’s important to take a closer look at how these technologies actually work and what they’re capable of.
So, Engine teamed up with Princeton University professor Nick Feamster to analyze the technical functioning of filtering tools and to look at how mandating their use would impact the internet ecosystem. In our new study being released today, The Limits of Filtering, we find that content filtering is functionally limited in how effectively it can be used to minimize copyright infringement, and further, that the costs of implementing filtering tools could be high enough to prevent new startups from entering the market.
While there have been significant advances in content filtering technologies since the adoption of the DMCA, the tools currently available are subject to a number of limitations with respect to their accuracy and adaptability. Critically, filtering tools can’t be used to filter encrypted files, torrents, or many of the different media types available online today, like handmade goods sold on Etsy.
In our study, we break down the functioning of a variety of content identification tools, focusing in particular on Echoprint, an open source audio fingerprinting tool that that is used by Spotify, among others. While Echoprint represents the state of the art in fingerprinting technology, it is subject to a 1-2 percent false positive rate for identifying audio content. As a point of comparison, email service providers generally consider any false positive higher than 0.1 percent as unacceptable for a spam filter, as it would hinder free expression by misidentifying legitimate email messages as spam.
Echoprint, like other fingerprinting tools, faces other limitations, such as its applicability to only one media type–in this case audio. And, like all filtering technologies, it only works if it has access to raw, unaltered files, so it can’t be used on encrypted content or search engines that do not actually host content. More broadly, even when a filtering tool has access to unencrypted media for which the tool is designed to work, it can only be used to identify content; it cannot tell if a particular use of a file is infringing or not. In many cases, the principle of “fair use” allows the unlicensed use of otherwise copyrighted content, as in news reporting or for educational purposes. No filtering technology can make these thorny legal judgments.
Beyond the technical limitations, the sheer cost of a filtering obligation would distort the market. For most online service providers that do not host large volumes of infringing content, the cost of filtering technologies far exceeds their effectiveness in limiting infringement. Many larger online platforms have voluntarily created tools to automatically identify content at great expense. YouTube spent $60 million developing its well-known ContentID tool. Soundcloud spent more than €5 million building its own filtering technology and still must dedicate seven full-time employees to maintain the technology. A recent survey of online portals found that medium-sized file hosting services paid between $10,000 and $25,000 a month in licensing fees alone for Audible Magic’s filtering tool.
This is only a tiny portion of the costs of filtering, as those companies would still have to invest in staff to apply policies and monitor compliance.
The cost of mandatory filtering would poke a giant hole in the business plan of startups, which have historically driven the growth of the internet sector, making it harder for new startups to attract investors or compete with incumbents. In a survey of investors Engine helped conduct in the U.S. and EU, a majority of respondents said they would be “uncomfortable investing in businesses that would be required by law to run a technological filter on user-uploaded content.”
Investors would never fund a startup if its business plan required a significant upfront investment in filtering simply so the company could exist. And, because technology changes so rapidly, it would be difficult for a startup to know in advance whether any particular filtering technology would be legally satisfactory in a world with mandatory filtering obligations, casting a cloud of uncertainty over any new platform startup.
Without the balance enshrined in the DMCA, the Internet today would be a different place. Etsy, Medium, Tumblr, Twitter, Facebook, YouTube, Flickr, and Instagram would likely not exist. So before considering mandatory content filtering rules, policymakers should understand the inherent limitations of filtering technologies. Reversing two decades of sensible copyright policy to require platforms to deploy tools that are costly, easily circumvented, and limited in scope would deeply harm startups, users, and creators alike.
Evan Engstrom is the executive director of Engine, a policy, advocacy, and research organization that works to support tech startups.
The views expressed by contributors are their own and are not the views of The Hill.