A team of researchers from the USC Information Sciences Institute studied two AI databases to see if their data was fair. They found that it wasn’t.
Water is wet. Dogs bark. There are 24 hours in a day. The Earth is round. (We checked.)
Those facts are what we call common knowledge: statements about the universe that are considered true, scientifically proven and known by everybody. Not stereotypes or biases toward any group or individual.
For those working on artificial intelligence algorithms meant to think like a human, commonsense knowledge databases are the starting point of their work. They feed the machine with this data so it can cogitate on its own to think like a person would. It is used for auto-generated content in the media, for copywriting in marketing, by chatbots and by virtual assistants like Google, Siri or Alexa. The most popular and widely used database is called ConceptNET, which is crowdsourced to collect those “facts” that people contribute to it, like they would on Wikipedia.
This data has to be fair to generate fair results that treat people of all races, sexual orientations, genders or nationalities equally.
But what if this data was biased from the get-go, leading to unfair treatment of different groups of people? A team of researchers from the USC Information Sciences Institute (ISI), studied the ConceptNET and the GenericsKB (a smaller player in the AI game) databases to see if their data was fair.
They found that it wasn’t.
More than a third of those “facts” are biased
“People have curated these large commonsense resources, and we tend to download those and include them in our AI systems. What we wanted to do is look at this data that is being edited by humans and see if it is going to reflect human biases. What biases are there? To what extent? And how do we characterize them?” explained Fred Morstatter, an ISI research team lead and USC Viterbi research assistant professor.
The USC team used a program called COMeT, a commonly used knowledge graph completion algorithm that takes data then spits out rules when solicited. This algorithm was created to think like a human by analyzing the information it is given and give out answers.
Read the full story on USC Viterbi School of Engineering’s website.