"A very serious example was revealed in an investigation published last month by ProPublica. It found that widely used software that assessed the risk of recidivism in criminals was twice as likely to mistakenly flag black defendants as being at a higher risk of committing future crimes. It was also twice as likely to incorrectly flag white defendants as low risk."
A recurring theme within AI circles is the need to get "better prediction, faster" by feeding more data. The assumption always seems to be that more is better and, by providing comprehensive data sets, targets can be correlated and identified faster. Yet again and again we see examples of critical failures in AI which can be laid at the feet of the data itself being bad or too disparate to be leveraged against the problem it's meant to solve.
Take, for example FBI claims that both the San Bernadino and Orlando shooters were flagged inside their own predictive systems, but were lost within a see of similar identifications. The statement "the suspect[s] were known to law enforcement" ceases to have any meaning in a world where such aggregation and prediction engines render everyone known to law enforcement, in some capacity. The problem becomes such a farce that it no longer can be described as "needle in a haystack", but rather "needle in a needlestack".
This does not even begin to touch on the traditional issues in data processing (garbage in, garbage out) which afflicts predictive analysis efforts. When historical socio-economic issues interface with this problem, we get enormous tech failures, such as criminal predictive algorithms training on data completely skewed by centuries of racialized policy.
The devil isn't in the machine, it's in us. We need to do better.