Bias in Computer Modelling
A computer doesn’t have prejudice; it merely carries out operations automatically, churning out calculations unspoiled by the human errors that signify our work.
And yet, some computer algorithms and models used worldwide have been exposed for their negative bias and their problematic consequences. Can a computer model be racist? Can it be sexist? These questions may sound abstract, but even subtle biases hidden in data can be warped and magnified to disastrous effect by the very computers we trust to be impartial. With the presence of computer models in our everyday lives growing, recognising their potential faults is more critical than ever.
What is a computer model, and how can it be biased?
A computer model is a tool used to make predictions. You feed the computer model data, it learns from it, and then makes predictions based on what it has learned from the information you provided. This process isn’t vastly different from how humans learn.
We absorb and remember information, and then when presented with a new situation, we made a decision based on the ‘data’ we’ve seen before (our memories). Imagine learning to catch a ball: over time, we try to catch, adjusting how much to move our feet, where to put our hands and bend our elbows to give us the best chance at success.
The key difference:
- Computers can look at much more data.
- They can spot patterns that we can’t.
- They don’t make random mistakes like our human error.
To say they are superior is doing them a disservice.
How can a computer model be biased?
Simply put, the bias in a computer model is when it makes a wrong prediction. There isn’t anything inherently sinister about bias in models. Using the catching example, you might be better at catching high balls than low balls. Curveballs might be a breeze, but fastballs cause you problems. You can’t be perfect every time, because you can’t practice forever. What if someone throws you a ball that you didn’t know was wet? How about if a gust of wind blew that you couldn’t feel?
Let’s say you make a computer model to classify pictures of fruit, and it’s brilliant at spotting bananas in pictures you give it, but rubbish at apples. This bias against apples isn’t the model’s fault. In most cases, the model knows nothing about fruit; it’s just looking for patterns in the colours of the pixels. The bias in this model would be the fault of the human building it, for not designing a good enough model to perform the task needed.
Bias is an inevitable consequence of making models because we don’t have unlimited data from which it can learn (amongst other reasons). The real problem is, given that our model will have some bias (make some mistakes), where are we willing to let these mistakes happen?
When does bias become a significant problem?
The stakes are low when our job is to classify pictures of fruit, but when human lives are involved, the potential for damage grows. The cost is often worth the potential payoff. Consider a computer model which we build to predict how aggressive a patient’s disease is at a hospital, and therefore how much treatment they need. The director of Medicine would be thrilled to implement this model if it saved their doctor’s valuable time and improved the overall patient’s survival rate. What if this model only helped female patients, and male patients recovery was unimproved?
We might argue that the model that was biased towards women was worth using because it is helpful on average. What if the model vastly improved female recovery, but slightly lowered male rates? Average recovery rates are up; who could blame the director for being tempted to use the model? Male patients might not see it that way.
Bias in a computer model built to help humans becomes unacceptable when it disproportionately and negatively affects a group of people while claiming to help all people.
In the real world, bias isn’t easy to spot, but that shouldn’t stop us from being ambitious in our aims. We should take care to check for negative bias hidden under the carpet of an overall positive effect on a population. Defining a ‘group of people’ is challenging; clear candidates would be gender, race, disability and more. Even if these groups of people don’t have clear cut boundaries, making official protocols for Data Scientists to follow tough to write, attempts to mitigate the damage caused by poorly built models would not be in vain.
How long has this problem been brewing?
In the early 1970s, the boom in car manufacturing led to standardised testing on crash safety using crash dummies — a great leap forward in some ways. The idea was simple — collect damage reports on the dummies in severe crashes and with all that data, make informed decisions about building safer cars. The only problem was with the crash dummies themselves; they were build using the proportions and biology of the average man. The researchers never thought to change their definition to the middle’ human’. The result is that today, 40 years later, “women are 73% more likely to be seriously injured if they’re in a car crash and 17% more likely to die”, even after ‘controlling for occupant age, height, and body mass index’. Women’s bodies aren’t inherently more fragile; it is because the data they tested on was not representative of the people they were claiming to help.
It is no coincidence that the scientists involved in this mistake were all men — humans, in general, aren’t brilliant at spotting injustice that isn’t happening to them.
Now, in the boom of computer modelling, we need to be aware of these faults, so that we don’t tarnish our new method of scientific advancement with the same dirty brush.
Why is troubling bias in modelling such a common occurrence?
Bias in the data itself
If we don’t have correct unbiased data for the models to learn from, we can’t expect the models to perform well when put into production. Insufficient data is hard to spot — Daphne Koller gave a brilliant example in an article for the NYT. Imagine we are building a computer model to predict if an X-Ray scan of an arm had a fracture. Suppose one of the three hospitals submitting fracture scans only scanned people if they were sure of a fracture. Also, suppose that the X-Ray technician at this hospital took all their photos from the same angle. Then the computer model might learn from this that arm scans at this angle are more likely to be of fractures.
The model hasn’t learned what fractures look like at all; it merely has spotted a pattern amongst the positive data points (the fractures).
What happens when new hospitals try to use this model? The model will predict fractures at a higher rate if the new hospital takes its scans at the same angle — our model would be useless!
This problem is not usually this easy to spot until after we use the model in public. Why not only train models on ‘good’ datasets then? The correct data is hard to find. It takes a sensitive eye to spot problems before they become problems, and that is assuming that people are actively looking.
Bias in humans
A computer model widely used in US hospitals predicts the ‘risk’ score of patients; a rating of how poor their health is. Patients past a certain threshold are seen by specialists, in an attempt to standardise the process of referral and help the patients who truly needed it. On paper, a brilliant tool, allowing the patients most in need to be seen by the correct doctors.
On closer inspection by a team of scientists at Berkley, this model uses ‘the total health costs’ every year as part of its prediction. If one patient’s health care is more expensive, it is a fair assumption to assume they are sicker than an otherwise equal patient. The only problem? This assumption is wrong.
While the average African-American patient and the average White patient spent similar amounts on their healthcare, the average black patient was much sicker. The result? The model required African-American patients to be sicker to get the same risk score than the otherwise equal white patients, and therefore, they needed to be sicker to get a referral.
Why do equally sick African-American people have less money spent on their healthcare than white people? There is no answer to that question that makes for easy listening — it is some combination of lower incomes, systemic racism, and the mistreatment of patients by medical professionals.
It’s important to note that even in this computer model, built to make referrals more standardised (and possibly eliminate the effects of bias in doctors), bias plagued the resulting predictions.
Why is this so important to be aware of this?
Machine learning isn’t the ‘future’. It is here. Now. It is already dictating how we live our lives. It matters that we are aware of these problems, so we can spot them accurately, and hold those using models with dangerous biases to account.
How can we stop it from happening?
The first step in solving a problem is acknowledging there is a problem.
Rather than mandating what other people must do, it is better to educate ourselves on the potential biases that affect our work, especially those which we are less sensitive to because they don’t affect us personally.
For Data Scientists, while it may be impossible to write a step-by-step plan for avoiding these mistakes, having impartial and experienced second parties checking our work for bias would be worth the investment.
Doctors have the Hippocratic Oath — perhaps it is time Data Scientists write their own.