The world of biased algorithms

The world of biased algorithms

The world is changing, more and more people are openly against racism, against gender discrimination, very much pro equality. People and companies are being called out when they do discriminate and there is a big movement against bias (in general).

One problematic thing though, for me as a programmer, is the backlash against ‘biased algorithms’.

Algorithms

What are algorithms?

If you look up the definition wikipedia you’ll find:

A sequence of instructions, typically to solve a class of problems or perform a computation. Algorithms are unambiguous specifications for performing calculation, data processing, automated reasoning, and other tasks.

In our case today, let’s define it is a set of instructions that a computer follows so it does a calculation and produces a result. People also do the same thing, they use sets of rules to come to certain conclusions.

Bias in algorithms

The problem here is bias. What is the problem with bias in algorithms?

Suppose we have an algorithm that determines if you do or don’t get a loan, and also determines the interest rate. This is great, before this system the bank had a human that was responsible for applying/declining loans, now a the computer can do this.

Now of course, it would be very bad if the programmer and product owners came up with the following rule:

if(applicant.isFemale()) {
   loan.addInterest(10); 
}

If is easy to state this rule is discriminating against a particular gender, we don’t want this discrimination.

But what if a certain car insurance charges 5% extra for male drivers under 25 with DUI charge? That kind of makes sense, it is still discrimination though, but we do allow this. It is a slippery slope.

The good thing though is: The rules are clear. Before there might have been a racist human doing the declines, now it is clear.

Bias in A.I./ML

The real problem though starts when you enter the world of AI, big data and machine learning. Now a programmer is no longer writing the readable rules. A machine is using a lot of data to make the best possible decision and find (invisible) correlations.

What is the ‘best decision’? Well, most of the time it is looking at the past, minimizing risk or cost, mimicking human behaviour. This is where a lot of bias comes from and it makes sense.

If you want everyone to be treated equally, there is no need for a complex algorithm in the first place. The sole purpose of these algorithms is to treat every individual differently based on certain traits. Sometimes we think these traits are obvious and valid, sometimes we think these traits racist and idiotic, but a computer/A.I. system doesn’t know the difference.

A reason A.I. declines a loan:

  • People with irregular income and a poor credit history > we’re okay with this.
  • Female from Easy Kentucky above 40 with a daughter > wait, what?

The algorithm is probably right though, it is minimizing risk. And while analyzing the given data it found some correlation. This might be because the humans that historically did the job before the algorithm had a certain bias and it was trained on that data, or there actually is a bigger risk for females from East Kentucky based on historic data.

Minimizing ‘bad’ bias

So there is good and bad bias, how can we use this? Well, if you are training the loan application, give it the ‘right’ data. Train the model with historic data that has income and credit history data, but not the fields for ‘race’ and ‘gender’.

There, problem solved.

Or not? It turns out those A.I. systems are very stubborn in finding correlations. If we don’t give the system any race information, but we do give the system an address and there is a current (or historic) correlation between race and loan declines… the A.I. might secretly group together people based on where they live. There are examples of this, where certain poorer black neighbourhoods are being discriminated against/biased without the system knowing anything about these neighbourhoods or the race of the people living there.

Algorithms are bias

The algorithmic bias problem isn’t going away, it will probably become much worse. Systems are getting more (and more) data and it will find many more correlations using this data.

It is impossible to have algorithms without bias because these decision-making algorithms are bias. The whole reason they exist is to add some bias into a neutral decision.

What’s your thought on this? Send me a tweet: @royvanrijn