Researchers carried out the first systematic review of existing research into AI in the health sector and published their findings in The Lancet Digital Health journal.
It focused on an AI technique called deep learning, which employs algorithms, big data, and computing power to emulate human intelligence.
This allows computers to identify patterns of disease by examining thousands of images, before applying what they learn to new individual cases to provide a diagnosis. Excitement is building around the technology, and the US Food and Drug Administration has already approved a number of AI algorithms for use in healthcare.
AI has been hailed as a way to reduce the workload for overstretched medical professionals and revolutionize healthcare, but so far scientific research has failed to live up to the hype.
Of the 20,500 articles reviewed, fewer than 1% were found to be sufficiently robust, said Professor Alastair Denniston from University Hospitals Birmingham NHS Foundation Trust, UK, which led the research, in a statement.
"Within those handful of high-quality studies, we found that deep learning could indeed detect diseases ranging from cancers to eye diseases as accurately as health professionals," said Denniston.
"But it's important to note that AI did not substantially out-perform human diagnosis."
Using data from 14 studies, researchers found that deep learning algorithms correctly detected disease in 87% of cases, compared to 86% for healthcare professionals.
AI was also able to correctly identify those patients free from disease in 93% of cases, compared to 91% for healthcare professionals.
While these results are promising, the researchers say better research and reporting is needed to improve our knowledge of the true power of deep learning in healthcare settings.
This will involve better study design, including the testing of AI in situations that are the same as those that healthcare professionals work in.
"Evidence on how AI algorithms will change patient outcomes needs to come from comparisons with alternative diagnostic tests in randomized controlled trials," said Livia Faes, from Moorfields Eye Hospital, London, in a statement.
"So far, there are hardly any such trials where diagnostic decisions made by an AI algorithm are acted upon to see what then happens to outcomes which really matter to patients, like timely treatment, time to discharge from hospital, or even survival rates."
Experts hailed the review while emphasizing the need for further research.
"The big caveat is, in my opinion, that the story is not 'AI may be as good as health professionals', but that 'the general standard of evaluating performance of AI is shoddy,'" said Franz Kiraly of University College London.
Nils Hammerla of Babylon Healthcare, a company that says it uses AI technology to improve the affordability and accessibility of healthcare, believes more work is needed before AI can reach its full potential.
"Machine learning can have a massive impact on problems in healthcare, big and small, but unless we can convince clinicians and the public of its safety and ability then it won't be much use to anybody," he said.
The global market for AI in healthcare is surging and is expected to rise from $1.3 billion in 2019 to $10 billion by 2024, according to investment bank Morgan Stanley.
Hospitals around the world are already making use of the technology, including Moorfields Eye Hospital in London.
Doctors are able to use an algorithm developed by DeepMind, a UK-based AI research center owned by Google, to return a detailed diagnosis in around 30 seconds using Optical Coherence Tomography (OCT) scans.
And AI technology can accurately identify some rare genetic disorders using a photograph of a patient's face, according to a study published in January.
The AI technology, called DeepGestalt, outperformed clinicians in identifying a range of syndromes in three trials and could add significant value in personalized care.