A new study based on Microsoft’s Bing AI-powered Copilot reveals the need for caution when using the tool for medical information.
The findings, published in Scimex, show that many of the chatbot’s responses require advanced education to comprehend fully, and nearly 40% of its recommendations conflict with scientific consensus. Alarmingly, nearly 1 in 4 answers were deemed potentially harmful, with the risk of causing severe harm or even death if followed.
Questions on the 50 most prescribed drugs in the US
Researchers queried Microsoft Copilot with 10 frequently asked patient questions about the 50 most prescribed drugs in the 2020 U.S. outpatient market. These questions covered topics such as the drugs’ indications, mechanisms of action, usage instructions, potential adverse reactions, and contraindications.
They used Flesch Reading Ease Score to estimate the educational level required to understand a particular text. A score between 0 and 30 indicates a very difficult text to read that requires a degree level education. Conversely, a score between 91 and 100 means the text is very easy to read and appropriate for 11-year-olds.
The overall average score reported in the study is 37, meaning most answers from the chatbot are difficult to read. Even the highest readability of chatbot answers still required an educational level of high, or secondary, school.
Additionally, experts determined that:
- 54% of the chatbot responses aligned with scientific consensus, while 39% of the responses contradicted scientific consensus.
- 42% of the responses were considered to lead to moderate or mild harm.
- 36% of the answers were considered to lead to no harm.
- 22% were considered to lead to severe harm or death.
SEE: Microsoft 365 Copilot Wave 2 Introduces Copilot Pages, a New Collaboration Canvas
AI use in the health industry
Artificial intelligence has been part of the healthcare industry for some time, offering various applications to improve patient outcomes and optimize healthcare operations.
AI has played a crucial role in medical image analysis, aiding in the early detection of diseases or accelerating the interpretation of complex images. It also helps identify new drug candidates by processing vast datasets. Additionally, AI supports health professionals by easing workloads in hospitals.
At home, AI-powered virtual assistants can assist patients with daily tasks, such as medication reminders, appointment scheduling, and symptom tracking.
The use of search engines to obtain health information, particularly about medications, is widespread. However, the growing integration of AI-powered chatbots in this area remains largely unexplored.
A separate study by Belgian and German researchers, published in the BMJ Quality & Safety journal, examined the use of AI-powered chatbots for health-related inquiries. The researchers conducted their study using Microsoft’s Bing AI copilot, noting that “AI-powered chatbots are capable of providing overall complete and accurate patient drug information. Yet, experts deemed a considerable number of answers incorrect or potentially harmful.”
Consult with a healthcare professional for medical advice
The researchers of the Scimex study noted that their assessment did not involve real patient experience and that prompts in other languages or from different countries could affect the quality of the chatbot answers.
They also stated that their study demonstrates how search engines with AI-powered chatbots can provide accurate answers to patients’ frequently asked questions about drug treatments. However, these answers, often complex, “repeatedly provided potentially harmful information could jeopardise patient and medication safety.” They emphasized the importance of patients consulting healthcare professionals, as chatbot answers may not always generate error-free information.
Furthermore, a more appropriate use of chatbots for health-related information might be to seek explanations of medical terms or to gain a better understanding of the context and proper use of medications prescribed by a healthcare professional.
Disclosure: I work for Trend Micro, but the views expressed in this article are mine.