
While AI tools can correctly diagnose half of the medical scans human radiologists misinterpret, more training is required before they can interpret scans that are non-interpretable, according to a recent study published in the British Medical Journal.
Researchers in the United Kingdom (UK) performed a study to determine whether an artificial intelligence candidate could pass the rapid (radiographic) reporting component of the Fellowship of the Royal College of Radiologists (FRCR) examination.
The researchers used 10 FRCR mock examinations for analysis. Each Radiologist had one month to record their interpretations for ten mock exams on an online sheet.
Each mock exam consisted of 30 radiographs covering all parts of the body in adults and children, with roughly half showing an abnormality and the other half being normal. Radiologist readers who have passed the FRCR exam within the past year were recruited through social media, word of mouth, and email.
The rapid reporting component of the FRCR exam tests a candidate’s ability to analyze and interpret 30 radiographs within 35 minutes, with a passing requirement of at least 90% correct quickly and accurately.
The Radiologists were asked to give ratings on the following:
- How representative the mock exams were relative to the actual FRCR exam.
- Their performance.
- How well they thought AI would have performed.
The researchers also provided 300 anonymized radiographs to the AI candidate, Smarturgences, developed by Milvue, a French AI company.
The AI tool was not approved to analyze radiographs from specific body parts, such as the skull, spine, teeth, and abdomen. However, the researchers provided the technology radiographs from these areas to ensure a fair evaluation across all participants.
The performance of the tool was measured using four methods. In the first method, only the AI-interpretable radiographs were scored, excluding non-interpretable radiographs. The non-interpretable radiographs scored normal, abnormal, and wrong in the second, third, and fourth methods.
The researchers found that most participants cleared the FRCR exam on their first attempt. The AI tool would have passed two mock exams in the first scenario. In scenario two, AI would have passed one mock examination.
They found, however, that in scenarios three and four, the AI candidate would have failed the examination.
When the researchers compared the ability of AI and radiologists to interpret images, they found that Radiologists demonstrated superior sensitivity, specificity, and accuracy compared to AI.
In addition, the researchers found that the AI was the highest-performing candidate in one examination but ranked second to last overall.
The study found no Radiologist passed all the mock exams. The best-performing Radiologist passed nine exams, while the least successful only passed one. On average, Radiologists passed four exams. They rated their performance 5.8 to 7.0 on a 10-point scale and the AI’s performance 6 to 6.6.
“On this occasion, the artificial intelligence candidate was unable to pass any of the 10 mock examinations when marked against similarly strict criteria to its human counterparts, but it could pass two of the mock examinations if special dispensation was made by the RCR to exclude images that it had not been trained on.” the researchers wrote.
The researchers concluded that the AI tool correctly diagnosed half of the medical scans that human radiologists wrongly interpreted. However, according to the authors, the AI still needs more training to perform at the same level as a Radiologist, especially regarding non-interpretable scans.
Source: Medical News