that example of the radiologist review cases touches on one worry i have about automation with human-in-the-loop for safety. specifically that a human in the loop wont work as a safeguard unless they are meaningfully engaged beyond being a simple reviewer.
how do you sustain attention and thoughtfully review radiological scans when 99% of the time you agree with the automated assessment? i'm pretty sure that no matter how well trained the doctor is they will end up just spamming "LGTM" after a while.
The likelihood is that models will "box" questionable stuff for radiologist review, and the boxing threshold will probably be set low enough that radiologists stay sharp (though we probably won't do this at first and skills may atrophy for a bit).
This is also a free source training data over time so market incentives are there.
Far more likely to be the reverse: people care about this right now and after 99% model-agreement rate the obvious thing to do will be to save money and change the threshold.
I have the same question about minor legislative amendments a certain agency keeps requesting in relation to its own statutory instrument. Obviously they are going to be passed without much scrutiny, they all seem small and the agency is pretty trustworthy.
(this is an unsolved problem that exists in many domains from long before AI)
How to you sustain attention of the other big X-ray use: security scanning? Most scanners will never see a bomb, so how do you ensure that they'll actually see one when it does happen?
The answer they've come up with is periodic tests and audits.
how do you sustain attention and thoughtfully review radiological scans when 99% of the time you agree with the automated assessment? i'm pretty sure that no matter how well trained the doctor is they will end up just spamming "LGTM" after a while.