Garbage in, Garbage Out

"A very serious example was revealed in an investigation published last month by ProPublica. It found that widely used software that assessed the risk of recidivism in criminals was twice as likely to mistakenly flag black defendants as being at a higher risk of committing future crimes. It was also twice as likely to incorrectly flag white defendants as low risk."

A recurring theme within AI circles is the need to get "better prediction, faster" by feeding more data. The assumption always seems to be that more is better and, by providing comprehensive data sets, targets can be correlated and identified faster. Yet again and again we see examples of critical failures in AI which can be laid at the feet of the data itself being bad or too disparate to be leveraged against the problem it's meant to solve.

Take, for example FBI claims that both the San Bernadino and Orlando shooters were flagged inside their own predictive systems, but were lost within a see of similar identifications. The statement "the suspect[s] were known to law enforcement" ceases to have any meaning in a world where such aggregation and prediction engines render everyone known to law enforcement, in some capacity. The problem becomes such a farce that it no longer can be described as "needle in a haystack", but rather "needle in a needlestack".

This does not even begin to touch on the traditional issues in data processing (garbage in, garbage out) which afflicts predictive analysis efforts. When historical socio-economic issues interface with this problem, we get enormous tech failures, such as criminal predictive algorithms training on data completely skewed by centuries of racialized policy.

The devil isn't in the machine, it's in us. We need to do better.

D-Wave is real! Maybe.

via Kotaku & Arxiv.
 

Quantum annealing (QA) has been proposed as a quantum enhanced optimization heuristic exploiting tunneling. Here, we demonstrate how finite range tunneling can provide considerable computational advantage. For a crafted problem designed to have tall and narrow energy barriers separating local minima, the D-Wave 2X quantum annealer achieves significant runtime advantages relative to Simulated Annealing (SA). For instances with 945 variables this results in a time-to-99\%-success-probability that is ∼108 times faster than SA running on a single processor core. We also compared physical QA with Quantum Monte Carlo (QMC), an algorithm that emulates quantum tunneling on classical processors. We observe a substantial constant overhead against physical QA: D-Wave 2X runs up to ∼108 times faster than an optimized implementation of QMC on a single core. To investigate whether finite range tunneling will also confer an advantage for problems of practical interest, we conduct numerical studies on binary optimization problems that cannot yet be represented on quantum hardware. For random instances of the number partitioning problem, we find numerically that QMC, as well as other algorithms designed to simulate QA, scale better than SA and better than the best known classical algorithms for this problem. We discuss the implications of these findings for the design of next generation quantum annealers.

 

I really want this to be a thing, but there is some funky irony in the basic problem. Quantum effects are hard to observe because observation may change the state of the system under observation.

It's uncertainty about uncertainty.

Here be dragons...

A few notes

It didn't rain today, but it probably should have. Managed to pull a manuscript out of my hat over the weekend and felt none the more capable for the feat. Just another reminder that scheduling remains a craft enemy. Prepping for a some heavy editing of the IEEE S&P submission, though won't likely see the editor's notes till after the holiday break. This marks the first time I'm actually unhappy about a deadline getting moved past a major vacation (would much rather knock it all out now than be dreading coming back to a pile of work post-Christmas). Still... pub count +1