Editor’s note: Did you know that your activity on the internet is constantly being monitored to construct online personas of you? Did you know that your digital persona is being used by a host of online agents to influence your online activity, showing you stories that you like, or that camera that you have wanted to buy, or that vacation that you would like to take? This is the first of a two-part series that examine how the innocuous use of personas can be subverted for more insidious purposes.
Following the US Senate Hearings, Facebook CEO, Mark Zuckerberg declared that he will “ensure fair elections in India“, and instantly had the entire Indian population wondering if our Election Commission job had finally been outsourced to Facebook! Was this an empty affirmation or does it have a basis in reality?
Way back in more innocent times — or so it seemed in 2006 — I took part in the Netflix Prize Challenge and became deeply aware of the massive potential there is to learn the persona of just about anybody based on seemingly randomised data and powerful collaborative filtering algorithms and the power of Big Data.
In 2006, when movies were distributed via DVDs rather than streaming, Netflix’s business interest was to improve the accuracy of its movie recommendation algorithm, Cinematch, and brilliantly resorted to crowd-sourced research. It offered a million dollars to the researcher who could come up with the best recommendation algorithm that exceeded Cinematch by at least 1o percent in predicting how a set of users would rate a set of movies that they had not seen.
That singular event spurred enormous research interest in peer recommendation algorithms and spurred new development of collaborative filtering and peer recommendation systems over the past decade, with startling consequences for today.
With the convergence of technologies such as face-recognition using deep learning networks and quantitative personas on individuals, even a few years back it was clear that only privacy laws prevented retailers from video-tagging you upon entry to a store and luring you into every department and every goodie that they know you want (and in many cases, even you don’t know that you want!)
The Big Data behemoths who have for years been trawling your digital trail, know all there is to know about you. With companies such as Google, Facebook, Yahoo, internet service providers, phone-service providers, and with many social media companies, there is no place to hide anymore. Activate Lightbeam, for example, on the Firefox browser and see who is gathering your digital trail debris when you visit any website.

Visitors are seen outside the venue of China International Big Data Industry Expo. Image: Reuters.
If you think you are being smart and are hiding your digital trail by activating privacy guards, blocking advertisements and unwanted cookies, and using anonymous browsing, there is still no place to hide due to the pervasive nature of data generation and agglomeration.
If you use a smartphone, or have done a Google search, or have any social media account, or a free email account, or use a credit card, or borrow books from a library, or shop online, or subscribe to magazines, or just walk around with a switched-off phone in your pocket — that is enough to build a digital dossier on you. You can bet it exists in digital warehouses from Palo Alto to Shanghai. We are prolific data-generators — without our knowledge in most instances, of value to various companies around the world.
Even if you think you are safe because you are NOT a prolific data generator, avoiding social media and such, be utterly terrified that the power of large-scale collaborative filtering can with utmost ease “fill in the blanks” about you, given billions of bits of data from millions of others with whom you share the world!
If you slept during probability and statistics class and missed that lecture on how averaged data from all distributions tend towards the normal in the limit, this is a good time to brush up and appreciate why this math even works at a high-level of modelling.

Representational Image. Reuters
Estimating information from “incomplete” data but with massive datasets occupies an esoteric niche where diverse streams such as information-theory, the much vaunted “P=NP?”, data-compressibility, filtering, systems theory, statistics, stochastic systems, AI, algorithms all join and build a deeply invasive digital avatar of you and millions of data-generators. Such digital avatars are frighteningly accurate in predicting your response to a variety of stimuli — of great value to online retailers, and of course, people interested in controlling you.
Recently Americans were shocked that its elections could have been influenced by foreign powers and spurred scrutiny of the entire social media ecosystem and how fake profile influencers could have caused such a deep attack on its democratic institutions. Attention also focused on a scandal in which the data analytics company Cambridge Analytica was implicated with having had access to 87 million Facebook users, without their permission, using grey ware apps.
The New York Times reports that Stanford University and Cambridge University were able to develop models of people based on their “likes” on Facebook. Such models were able to score a person’s “openness, conscientiousness, extraversion, agreeableness and neuroticism”, and were adept at “predicting life outcomes such as substance use, political attitudes and physical health.” The methods developed are reportedly able to get acceptable accuracy with as little as 70 “likes”, and with 300 “likes” are quite accurate.
The newspaper further reports that Cambridge Analytica approached the research centre at Cambridge University for access to this model, but was refused, after which it turned to a professor at that university, who developed an app “thisisyourdigitallife” which was capable of harvesting such information from users and their unsuspecting friends.
Some people think that by providing incorrect data (or noise) to online forms, they become immune to such machine-learning. However, the power of these new algorithms is to address precisely that — given noisy and incomplete but large data-sets, how do we deduce/construct/estimate information-content?
So when the Facebook CEO states he will “ensure fair elections in India”, it is NOT a statement of arrogance — it is an honest admission of the POWER that they have to sway elections, and the propensity for malicious parties to hijack that knowledge for furthering their agendas. The demonstrated outcome in the USA elections shows that a disruptor is capable of gathering the data and has the means to control the outcome. It boils down to knowing what triggers a person, and how to exploit that vulnerability. And a digital persona tells them exactly that.
The issue at stake goes beyond Facebook users who can be swayed by planted stories. Social media companies such as Facebook, Instagram, WhatsApp, Twitter and others are in the business of data-consolidation, aggregation, inferences and retailing that data to buyers. One can imagine how knowledge of deeply invasive data of a large section of the population can swing electoral outcomes with great efficiency.
Some claim that the digital divide in India makes us immune to such manipulation, and anyway, “gift-pushing” politicians know exactly how to sway the vote in villages. It does not matter that the vast majority of villagers don’t have social media — they have digital phones and leave digital signatures behind which can be used to build big-data based personas. Today the biggest information-vendors don’t work in isolation — they do commerce in data between themselves, allowing large-scale consolidation of personas. Today social media companies are on top of all that and more, sitting as they are on the biggest collection of data — photos, messages, “likes”, “emoticons”, stories read, and much more.
We will examine this in greater detail in the second part tomorrow.
The author is an electrical engineer who works with modelling, simulation and optimisation technologies. He holds a master’s degree in Electrical engineering from Indian Institute of Science in Bengaluru, and a doctoral degree in Electrical Engineering from Oregon State University.
Source: firstpost.com