Google’s recent integration of Gemini, its state-of-the-art large language model, into its search engine marks a significant step in AI-driven responses for digital health queries. Worldwide, users search Google with health queries 70,000 times per minute, so it has years of experience in communicating health information and a solid framework for producing high quality, authoritatively sourced responses for users.
Except when it doesn’t.
As a physician (R.S.H.) and public health communicator (A.B-G.) who work together on technology solutions to improve digital health communication, we understand the importance of digital products like Google serving accurate health information. As we say in our new paper in the Journal of Health Communication, we believe that generative AI tools have the potential to be trusted and credible sources of health information. But they haven’t yet reached this pinnacle.
Gemini, for example, can produce inaccurate and potentially harmful information related to health. Users quickly took to social media to discuss some of the wackier outputs, such as “add Elmer’s glue to pizza sauce to prevent the cheese from sliding off” and “geologists recommend eating at least one small rock each day.”
Not all of the misinformation is obvious and humorous. Another widely shared result from Gemini was “Doctors recommend smoking 2-3 cigarettes per day while pregnant.”
Innovation in artificial intelligence is progressing at a blisteringly fast pace. Foundational large language models and other generative AI technologies have the potential to scale solutions to address problems of access to health care, medical misinformation, and burnout among physicians and other health care and public health professionals, among others.
As AI products are implemented by both large commercial and smaller scale developers, they also, however, have both the potential to make the problem of medical misinformation worse and cause real world harm to individuals. While the examples we cite seem to be isolated answers among millions of high-quality ones, they underscore the importance of considering health impacts to individuals from AI products that are not specifically designed for health information, as implications for a user trusting this advice could be profound.
When providing health information or medical advice, clinicians, researchers and public health communicators are guided by four fundamental principles of medical ethics —non-maleficence, beneficence, autonomy, and justice — to make morally sound judgments and protect individuals from harm. AI products that are intended to or may produce health information or medical advice should not be exempt from following these ethical principles, which should apply to their development and deployment.
Non-maleficence, otherwise known as “Do no harm,” is the bedrock of clinical and research decision-making. In medicine, this is a complicated estimation of the risks and benefits of an intervention, reminding providers not to underestimate their ability to cause harm even as they are trying to help. When taken too literally, this principle as a guide risks stagnation, and the potential creation of harm through non-action. A parallel can be drawn to the current spectrum of rapid AI development philosophy, where effective altruism and effective accelerationism sit at both ends.
While the benefits of AI products may outweigh the risks, the intentional avoidance of harm should be at the core of AI product development and deployment, especially when health content is being generated. Practically speaking, technology developers can follow this ethical principle by prioritizing safety through red-teaming efforts, ensuring high quality and authoritative sources are used and ranked highly in training data, and conducting research studies on user interactions with their product before it is shipped.
Beneficence, the principle of doing good, balances non-maleficence and drives innovation, proactiveness, and preventive decision-making. AI development for products that will communicate health information must embody this principle and place the user’s best interests at the center of each step in the development process. A preventive approach to AI product development can use prompt engineering to detect when queries are related to health and prioritize the use of retrieval-augmented generation in those cases to reduce the chance of inaccuracies and hallucinations. Retrieval-augmented generation (or RAG) references a knowledge base outside of a large language model’s training data before generating a response to optimize output accuracy.
Autonomy in medical ethics means that patients have the right to make their own decisions regarding their personal medical care. AI products that scale accurate health information dissemination have enormous potential to improve individual autonomy in health and medical decisions. It is imperative, however, that technology developers recognize that scaling autonomy means that they must train their AI products to provide balanced, accurate, and unbiased health information.
Justice in medical ethics means treating everyone equally and fairly. When it comes to the ethical development of health AI, there may be no area of greater need than in ensuring justice for all users. Historically marginalized populations are disproportionately affected by false health information and more likely to be affected by bias. At every step of the development process lies the potential for introducing biases that can worsen these inequities and increase systemic inequalities. AI developers can prevent and reduce bias through technological solutions, such as curating unbiased training data, prompt engineering strategies like chain-of-thought processing, and post-training strategies such as re-ranking and modifying loss functions. But there is also a need for including diverse community perspectives in early tech development through participatory research to better understand what fair representation looks like for those communities.
The early introduction of user-centered medical, public health, and research ethical principles adapted for AI product development could potentially influence health outcomes in positive ways. Society has seen the consequences of previous technologies that have not always prioritized information accuracy over engagement, and an opportunity exists now to prevent an industry-wide repetition of those previous mistakes.
Research on user interactions with genAI tools and their effects on health-related attitudes, beliefs, and behaviors is essential to guide the development of these ethical frameworks. At NORC at The University of Chicago, our team is embarking on research agendas to explore these interactions and aims to provide valuable insights that can center the promotion of non-maleficence, beneficence, autonomy and justice in AI-generated health communications for all people.
Rebecca Soskin Hicks, M.D. is a physician working at the intersection of clinical medicine and innovative technology and a fellow at NORC at The University of Chicago. Amelia Burke-Garcia is a health communicator and director of The Center for Health Communication Science at NORC at The University of Chicago.