Humans failed, not the data.
Hillary Clinton’s campaign for the presidency was famously, proudly, data-driven. For months, a trail of reporters chronicled the magic of the Clinton team’s “digital strategy” with dizzied wonderment. A data chief who scribbles on walls in erasable marker like Russell Crowe in A Beautiful Mind! Subtle but telling changes to landing page design! Something called “cost per flippable delegate!”
Now that Clinton has failed, the revenge against data has been swift: Since Tuesday’s surprise election, we’ve been told that Trump’s surprise victory undercuts the belief thatanalyzing reams of data can accurately predict events; that it explodes the received wisdom about the value of data-driven campaigning; that data doesn’t matter.
But Tuesday was not a failure of data; it was a failure of forecasting and analysis — by humans. The data was as good as it could be, but the analysis of it lacked depth. If anything, the forecasters’ spectacular and almost unanimous collective failure to see Trump’s win coming provides an opening for a more productive conversation between numbers and words, statisticians and analysts, data and message.
The Great Data Debate
Much of the Great Data Debate has focused on two things: the polls “got it wrong;” and polling data, no matter its quality, was powerless to grasp the hidden electoral momentum generated by Trump’s populist appeal to the bruised pride of working-class whites.
Yes, many polls underestimated the strength of Trump’s support. Yes, Tuesday was another blow for a polling industry already winded by several recent big misses and facing numerous structural obstacles. But polls were never designed to be forecasts. They are simply one basket of data points among many others.
The real problem is that we haven’t done enough work to look beyond the polls and find new data sets that can improve political analysis — an especially urgent task in an age of volatile electoral moods.
The data is out there. We just need to get more creative in looking for it.
The firm I work for, Predata, is engaged in this very search for alternative ways of understanding politics. For the election, working off the theory that political campaigning increasingly takes place online and voters are increasingly inaccessible to polling firms, we developed signals to capture shifts in the digital conversation around the race. To produce these signals, we gathered and analyzed hundreds of thousands of data points every day.
Humans failed, not Big Data
Having had some success with our Brexit forecast earlier in the year, on this occasion Predata— like practically everyone else — got the call wrong and predicted that Clinton would win. There was nothing fundamentally wrong with the data; the data was good. It’s just that the humans (well, human: me) curating and analyzing the data underperformed.
Influenced by the percussion of polls and punditry heavily suggestive of a Clinton win, I allowed myself to ignore signs in the data that Trump was ahead in both the battleground states overall and Florida. That was a mistake. But it was a fundamentally human mistake. The data was blameless.
All data sets and data-driven forecasting models — even those that claim to run off artificial intelligence — are, to some extent, a reflection of their creator’s own biases. There is a subjectivity embedded in every curatorial choice that goes into the creation of a poll, or a set of signals to monitor debate online, or a prediction model. The interpretation of data, too, is necessarily subjective. But one mistake does not mean we should forfeit the game. Gather data, crunch data, interpret data: there is nothing fundamentally unsound or stupid about this basic exercise. It’s still worth doing. But we need to get better at understanding what the data can tell us — its potential and limitations — and how it fits into a broader analytical picture.
Need to bridge the geek divide
There’s still a cultural divide that separates the geeks (the data scientists and statisticians) from the poets (the reporters, the color writers) in coverage of political campaigns. Neither has a monopoly on the truth, as Tuesday showed. And each can offer useful information in our ongoing quest to make sense of messy reality.
To get better at forecasting big political events, we need both better data and sharper reporting, a clearer read on the numbers and a more penetrating portrait of on-the-ground realities — and a more active exploration of the intersection between the two. That means more words informed by data, and more data worked on by words: the marriage of techies and fuzzies to which good technology always tends.
In our exploration of this blossoming new age of data, we’re still no better than Monsieur Hulot in his new kitchen. The epistemological blunders of the last few weeks shouldn’t impel us to give up on data. They’re an invitation to keep blundering on, keep making mistakes, and hopefully — with flexible minds and a better sense of the limits of what is possible — make data great again.
This article was originally published on www.fortune.com and can be viewed in full
Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)