Monthly Archives: March 2015

Silver Bullet: Fighting Crime against Women

Crimes against women varies over a large range across the various states in India in terms of their rate of occurrence. This may have multiple reasons but it may also be complicated by the fact that poor law enforcement in poorly governed states may result in under-reporting of crime. But if we assume the NCRB data to be true, the distribution across various states of India is,


The states’ ranking in this is odd when one uses the prism of conventional wisdom in terms of advanced and backwards states.

A natural place one will want to look at next is police presence. Some Indian states have truly bad Police-Population ratios. Perhaps that’s a contributing factor. Another factor is number of police stations — perhaps a greater distribution of police stations as opposed to actual number of police personnel is a factor. An analysis of their scatter plots suggests these factors’ relationship with crime against women isn’t as strong as one’d expect them to be. Their respective scatter plots (with a regression line in red) are,

Police Station







The overall police presence as a ratio of population does seem to have some negative effect on crimes against women compared to police stations’ spread. But it’s not as strong as one’d expect.

Now consider the other factor: number women per police station in each of these states and female workforce participation in each state. Their respective scatter plots with a regression are,







These two factors, namely the presence of women in police stations and their participation rate in the workforce, seem to have a significant negative impact on the crime against women. The first factor is one of possibly bringing a female perspective to policing improving security. The other is that women who work outside their homes seem to actually reduce the likelihood of crimes against women.

A simple way to compare all of these factors will be to have a regression model that includes all the factors and measure each factor’s relative weight in the model. When one does that, the relative weights show up as,



Thus, the domineering factor as we see is presence of women in police stations. States that have more women in each station do better on preventing crime against women even if they have a worse overall police to population ratio. Consider two relatively advanced states for an example: Kerala and Maharashtra. While Maharashtra has fewer police personnel per every lakh of citizenry compared to Kerala, it has far more women as a proportion of it. The state of Maharashtra also records fewer crimes against women.

Of course none of this can be considered causal and under-reporting in Bihar and Uttar Pradesh are likely to undo the above analysis. But a simple but effective strategy for states in fighting crime against women appears to be recruiting more women in their police force. And having more financially empowered women in the population.

[1] – Data on crimes against women was from NCRB

[2] – Data on Police Organizations was from BPRD

[3] – I have collated the data in a spreadsheet. Should you want it, please find it safety

Life in the time of data

With very few exceptions, Data Science and Machine Learning as they exist today are about optimizing processes. We mostly predict likelihood or target resources better or categorize things. Sometimes using fancy algorithms and at others using basic generalized linear models. But in general we expect the cartoon of our past to predict our future.

There is plenty of criticism this data driven culture is subjected to. Much of it comes from uninformed people in social sciences. Their objections are often trivial. A far more important and troubling objection is: this reduces civilization to actually optimize status-quo and not advance. The most famous example of such refutation is Kepler and his laws of motion. That was uniquely a human insight that no algorithm can arrive at; even today. And it was essentially insight from data. Almost no one will argue they can design a system, with how many ever clever back propagation networks, to arrive at what Kepler did from the data he had. No one’d even claim we’d reach a stage where algorithms can do that any time soon in the future.

The point of that example is: do we no longer contemplate about data sets as deeply as Kepler might have simply because our belief in algorithmic processing of data is so strong that we see no merit in holding the entire data set in our heads? We possibly see that as data equivalent to rote learning; a waste of critical brain resources that can be freed for doing higher things.

A related aspect in our modern lives is the absence of intellectual silence. We’re often surrounded by things and people often far more impressive than ourselves. It almost sounds rude to think one’d think for oneself when the finest of the world on that exact topic is ready to speak to you, albeit in a lecture or podcast, in the device you hold in your pocket. It’s unlikely that one is going to out-think what’s available in one’s pocket. But what seems the honest thing to do in that instance certainly makes our overall epistemology suffer a lack of originality. It’s easy to imagine a world where originality of thought is rare and we’d all be poorer because of it.

A useful technique then is to possibly alternate between these two states by consciously allocating their specific times. But any such a-priori allocation of time for categorizing thought sounds awfully arrogant in itself. We could possibly model ourselves on, say, Kepler. And apportion our consumption of thought and thinking accordingly. Which defeats its own very idea. We certainly live in difficult times.