For some time now, I see a growing number of articles praising algorithms and computer programs. They will apparently solve all of our problems, including the ones we were not aware we had, faster and cheaper than any human could. As a computer scientist, I am fully aware of the fact that computer science gives us stunning possibilities these days. But I sometimes wish we would all take the time to remind ourselves of some simple facts.
Algorithms are conceived by humans
Yes, shocking, I know, but algorithms and computer programs do not appear out of thin air overnight. They are the results of men and women’s work, and as such, they can reflect human bias.
A good example of this is Google’s Word2vec. Word2vec is an ensemble of neural networks, a machine learning method creating representations of words in a vectorial space based on their linguistic context. We can then see which words are used in similar context (cat and dog for example). It also maintains relations between words in the form of vectors. The best-known example is the relation “man : king :: woman : queen”, meaning that the vector between man and king is the same as the one between woman and queen.
Anyway, “why this digression on Word2vec?” you may ask. Well, it turns out that some researchers recently found that this tool was biased. It is trained on a corpus of three million words extracted from Google News and while the factual relations like “Paris : France :: Tokio : Japan” work well, those concerning women reflect society’s sexism. If you ask Word2vec to answer “father : doctor :: mother : x”, it will tell you that “x = nurse”. And it is not an isolated error, as the researchers found quite a lot of stereotyped she : he analogies (homemaker : computer programmer, sewing : carpentry, registered_nurse : physician, nanny : chauffeur, housewife : shopkeeper, cosmetics : pharmaceuticals, …).
My aim here is not to criticize Word2vec. It is rather to emphasize the fact that algorithms and computer programs can, and often will, reflect human flaws. It is therefore important to not blindly trust results “because the computer says so”.
(One way to resolve this trust problem is of course to use open source softwares.)
Algorithms need data
This might of course seem like a stupid thing to say. But I do wonder if people are really aware of what “more intelligent and personalized algorithms” imply.
Does everyone realize that if today’s computers and smartphones feel so “intelligent”, it is because we basically allow them to know everything about us, from our taste in music to our detailed agenda? I think a lot of people don’t realize the extent of what private companies can know about their lives, be it Google, Facebook or others big digital companies.
This is particularly worrying for services where the majority of users are teenagers, since they usually are even less aware of this issue than adults. I can only applaud initiatives like the rewriting of Instagram terms of services for children by the UK Children’s Commissioner (and let’s be honest: if all terms of services were written like that, a whole lot more of us would read them (which reminds me of this study where people agreed to give their first born to use a social website (and no more parenthesis in a parenthesis for today, I promise))).
I’m not saying agreeing to give up some data in exchange for a service is inherently a bad thing (that would be wildly hypocritical of me, seeing that I’m lost without my digital tools). But I wish that sometimes we really ask ourselves: is this really worth it? Is this service worth the data I’m giving up for it?
In a time where China wishes to automatically grade their citizens according to, among other things, their social media activity, it is important to ask ourselves how much trust we place in computer programs and which data we wish to give them.
ESOF 2018 in Toulouse will of course tackle this question. The website in now up and running. Don’t hesitate to browse it and to contact the delivery team if you have any questions or if you wish to participate!