Problems with Opinion/Exit Polls in India

After the recent failure of exit polls in Bihar, pollsters from Delhi have decided to form a self-governing body of sorts. This is unlikely to be of much use.

The problems with opinion and exit polling in India aren’t unique. But they are uniquely complicated by the fact that it’s the world’s second largest population with a FPTP system in a multi party electoral democracy in a vastly unequal society. The basic ways in which a survey, which is what a poll is except it asks questions on voting, can go wrong are finite. There can be either be sampling errors or response bias or a combination of the two.

A simple random sample will be one in which a pollster takes every nth voter from a voting list. Except, in real life we run into problems that caused an upheaval in America’s nascent polling industry in the 1940s; George Gallup had to testify in a Congressional hearing about response rates of his polling. That is in America of the 1940s where only 10% of the population was black and most of them were disenfranchised anyway owing to Jim Crow laws. But even for that relatively homogeneous white voting population, the response rate had fallen to single digits. That is, if every nth voter is called, less than 10% of those called told the pollster their opinion/whom they voted for. This is for polling via telephone.

In India, the problem is complicated manifold. Firstly, telephone surveys don’t work at all. There’s a selection bias that’s too huge to account for if only people with phones were polled. And, there are cultural and linguistic reasons as well: it’s very difficult for someone who speaks in a certain urban way to get a Dalit in a rural area to answer questions on political opinion even if one assume the said Dalit possesses a phone and speaks the same language. Even if the pollster actually goes to where the voter is, it gets extremely complicated given how disparate the voting population and their living conditions are.

Firstly, a lot of people are going to refuse to answer. That’s a fundamental problem for pollsters everywhere. This problem when overlaid with caste and religious fault lines in a largely feudal society, gets impossible to account for. The only kind of Dalit, for instance, who’s likely to answer a survey is the kind of Dalit who’s politically active. This is true for most most non-dominant castes and other social groups in various degrees. This is over and the problem of basic sampling;  that in itself is difficult to get right given Census data does not provide caste information. At least until the last set.

And finally, even if they do answer, they can and often do lie. As it’s been established, socially disadvantaged groups in feudal societies tell their interviewers what they think the interviewer wants to hear. So the problem thus far: we can’t get a random sample, we can’t adjust for selection bias because we can’t get the social break up of the voting population right, we can’t account for response rates and the resultant sampling bias and we don’t account for response biases given we don’t know the break-up in the first place.

The only way in which any of the above problems can be addressed is if the data is made public and there are more and more such polls so that some kind of adjustment factor for each of the above problems for each of the above groups can be found. These are academic endeavors that for-profit companies with short term goals don’t solve well.

The other possibility, less likely but not implausible, is that one of the polling organizations makes data up or fudges it. Many of us when faced with such a difficult task may resort to that. Added reasons such as political bias and the ability of polls to influence people make it even more tempting. So such a regulatory body may be useful on that count.

Fudging can happen in many ways. The most simple form of it can be the kind that was found in the Research 2000/ Daily Kos scandal. It was simple manual fudging; exposed because of the inability of humans to be random even if they wanted to. The data was found to have even numbered percentage responses for both genders. The probability of that in one poll, which can happen, is 0.5*0.5 = 0.25. But they had even numbered response percentages for both genders for 24 successive polls. That results in an impossibly low probability; so low, it’s less than one in all the atoms in the Universe. Most reasonable people will conclude, that’s fudged. But that’s just one way to fudge it. No one is going to do this again given people will now look for this.

The other method of fudging data is by having a base data set and adding noise to it to make it seem like it’s a different data set. A famous example of this kind of data falsification was the gay marriage study which everyone was talking about in 2014. It’s relatively easy, again, to find out this kind of falsification. The Kolmogorov-Smirnov test does precisely this. And it’s something Physicists use to do far more important things, because of which it’s a well developed test. Even for those of us who did not take courses in Statistical Physics, the test is easy to administer these days given it takes one line of code. It’s tempting to think, those who have political biases will indulge in precisely this kind of falsification. It’s relatively easy and serves the purpose really well to take a base of a previous election you won and falsify it as new opinion/exit poll.

A self-regulating body, one fears will hardly be pointing fingers at itself. This is why opening up the data is important since the scientific method will address these aspects much better. The problem pollsters have of course is two fold: their secret sauce gets revealed and the secret sauce in this case often tends to be a series of assumptions which may be wrong.

Finally, a far better summing up of the American situation can be found here.