Studying the Test & Living the Predicted Life

There is a problem in our culture at this particular moment as it relates to mankind’s technological evolution. Data Mining has permeated most avenues of human activity; particularly those of readers of this post. The most common instances of it that we’d encounter are when we buy things online or watch something on Netflix. It’s also true of what we are likely to read — be it on Zite/Pulse or on the venerated New York Times — and what we see on the aisles of organised retail.

Of course it makes much sense and profit for those companies that’ve mined us; at least in the short term. The basic algorithm that is now about 20 years old is mind numbingly simple at its simplistic core: it merely discovers a rule for association with a certain level of confidence. That is, given you have bought X and Y you are likely to buy Z. Extend this for movies watched or articles read. The algorithms in those cases can be tweaked to something more fancy and tailored to one’s profile, but the recommendation problem is still Chapter 9 of this now classic textbook.

The problem as you can see is that the recommendation problem relies on a quasi closed set. That is, every subset of a frequent set has to necessarily be frequent itself. In other words, Z had to have been bought or watched or read enough times for it to be recommended to you. In the larger scheme of things, it simply means as a culture we’re now settling for exploit in the explore vs exploit problem that life presents us. Directors and producers who will invariably let this affect their art reduce the possibility of art transcending status-quo. It possibly also explains click baits on headlines and our reaction to it.

The other problem this creates is that it incentivises ghettoisation. For instance, one could argue, some 20 years ago the average person had a much higher likelihood of consuming news/commentary and general information from a wider variety of sources. Ever since this slide towards creating targeted content for specific groups started, with FOX News or Sun TV being good examples, the distinct constituencies have moved farther away from the original centre of public opinion. After all, once a bubble has been created it only makes sense for that bubble to feed itself. Its mean appears to be the true mean from within. And purists within will want to move the bubble ever farther away from their own and therefore the centre of public opinion. This process, as any of us who has a Twitter account can vouch for, has been exponentially accelerated in that part of media whose content is user generated.

The efficiency of tending to settle for exploit as opposed to explore makes sense for those selling. And makes absolutely none whatsoever for those buying. What we get is bad art, bad journalism, poor imagination in the design of products that’re sold and a far more intolerant society. A statistical understanding of the world is useful only when the world itself does not react to the statistical description of it. Since we’ve crossed that Rubicon, it’s reasonable to ask if Data Science as it applies to recommendation systems serves any positive purpose.