Big Data signaled winner days before Election Day

Getty Images

At every moment of our lives, we are contributing to a never-ending trail of data, our digital footprint. Large amounts of data generated at a high velocity from a vast variety of sources, which, when analyzed in aggregates, can reveal things that matter. This is called Big Data.

In fact, real-time Tweets, Google searches and Facebook posts are part of the Big Data phenomenon. Twitter users upload nearly 500 million tweets per day that reflect our deepest emotions and thoughts. Google processes roughly 3.5 billion searches daily. In fact, “let’s google it” has become a universal refrain, creating a window into thoughts and translating private lives into quantitative insights. Big Data is the engine propelling Artificial Intelligence (A.I) to become an extension of human intelligence. AI’s ability to predict election outcomes is promising, and in the near future will be able to yield predictions with greater accuracy in the realm of politics. 

I lead the predictive analytics and A.I research lab at New York University’s Courant Institute of Mathematical Sciences, where we apply data science to design AI tools.  Months before Election Day, we mined the wisdom of online crowds by connecting the dots from multiple data sources for the purpose of detecting early signals and forward insights into the election. Since early 2019, our team collected nearly 94 million tweets related to the 2020 presidential race. About four million of them were collected specifically around the two presidential debates, the vice-presidential debate and the townhalls. We applied opinion-mining algorithms – a commonly used natural language processing tool that deploys artificial intelligence to interpret and categorize emotions – to examine Twitter users’ attitudes toward the nominees. Additionally, we analyzed a sample of Google searches used in Google Trends between March 2020 and October 2020, which yielded a dataset representative of all Google searches during that period. 

In October, during the presidential townhalls we discovered a major turning point in the Twitter race between Biden and Trump. Our data indicated that Biden led against his opponent in Twitter popularity for the first time since his 2020 presidential run announcement: Biden was mentioned 6 percent more than President Trump. This stands in stark contrast to our analysis of the first presidential debate, when Trump dominated this metric with 43 percent more mentions than Biden. Biden’s social media popularity continued on an upward trajectory — notably when he started reading random tweets from supporters and responded to them in a friendly video.

During the town halls, both candidates received almost the same number of tweets expressing negative sentiment (approximately 30 percent of the study’s tweets), or approximately 4 percent less than from the first debate. After the town hall, however, the number of tweets expressing positive sentiment for Trump rose by approximately 54 percent as compared to the first debate, while the number of tweets expressing positive sentiment for Biden rose by approximately 37 percent as compared to the first debate. 

During the first debate, the Twitter positive mood (approximately 5 percent), negative mood (approximately 31 percent), and neutral mood (approximately 64 percent) remained nearly constant before, during and after the first debate for both Trump and Biden, indicating that the event did not have much impact on public opinion among Twitter users.   

By contrast, during the second debate, the number of positive tweets for Trump increased by 22 percent before, during and two hours after the debate. It might have been due to changes in his performance relative to the first debate. Biden’s percentage of positive tweets remained the same from the period of two hours leading up to the debate, during the debate and two hours after the debate.

On the race of Google searches pertaining to the second debate, “Joe Biden” was searched twice as much as “Donald Trump” in each of the 50 states — a result consistent with the first presidential debate and the town hall events. The focus of these searches centered on Biden’s stance on fracking, the president’s criminal justice policies and the development of a vaccine for COVID-19. We also noticed that Biden received more searches from overseas users, perhaps due to his relative lack of familiarity abroad, whereas in 2016, Trump received more overseas searches than Hillary Clinton. To sum up, Biden was the subject of a greater number of Google searches during both presidential debates and the town halls, which indicates a growing public interest around him.

On November 2, we reported that the online Google searches related to yard signs showing support for Joe Biden outpaced those for Donald Trump yard signs by 28 percent between March 2020 and October 2020. This stands in contrast to 2016, when Trump’s yard sign search interest topped that of Hillary Clinton, an advantage that stretched across the country. We noticed a trend: Biden led Trump in Michigan (54 percent to 46 percent), Minnesota (60 percent to 40 percent), Nevada (68 percent to 32 percent) and Wisconsin (57 percent to 43 percent). In Pennsylvania, Biden led Trump (56 percent to 44 percent), with the president leading in the Harrisburg-Lancaster-Lebanon-York area by 4 percent and with Biden holding the advantage in Pittsburgh and Philadelphia by 23 percent. 

These outcomes we reported a day before the election were relatively consistent with the most recent votes count. Analyzing these Google searches is a novel approach to looking at presidential candidates’ support in ways that go beyond traditional measurements, such as polls or campaign contributions. 

In the information age, Big Data and AI are here to stay. Tweets, Google searches and Facebook posts should not be ignored and may prove crucial in predicting a presidential election regardless of the noise and Twitter bots that can be algorithmically filtered out to a certain extent. Once again, the Twitterverse and Googlers provided a bright bulb to foresee the future, this time in politics. 

Anasse Bari, Ph.D. is, a professor of computer science at New York University. He leads of the predictive analytics and AI research lab and is the co-author of the book “Predictive Analytics for Dummies.” Follow him on Twitter @BariAnasse.  

The research team led by Prof. Bari included computer scientists: Matthias Heymann, Aashish Khubchandani, Alankrith Krishnan, Julia Damaris Yang, Shailesh Apas Vasandani, Vikas Nair, Daniel Rivera Ruiz and Laurence Bugasu Ininda. The AI research lab is funded, in part, by Amazon AI and Amazon Web Services.

Tags 2020 presidential campaign 2020 presidential debates 2020 presidential election Amazon Donald Trump Facebook Google Google Search Hillary Clinton Internet manipulation and propaganda Joe Biden Joe Biden 2020 presidential campaign Twitter

The Hill has removed its comment section, as there are many other forums for readers to participate in the conversation. We invite you to join the discussion on Facebook and Twitter.

More Technology News

See All
See all Hill.TV See all Video

Most Popular

Load more


See all Video