
Medical researchers at some institutions in Canada, the United States and Italy are using data created by artificial intelligence (AI) from real patient information in their experiments without the need for permission from their institutional ethics boards, Nature has learnt.
To generate what is called ‘synthetic data’, researchers train generative AI models using real human medical information, then ask the models to create data sets with statistical properties that represent, but do not include, human data.
Typically, when research involves human data, an ethics board must review how studies affect participants’ rights, safety, dignity and well-being. However, institutions including the IRCCS Humanitas Research Hospital in Milan, Italy, the Children’s Hospital of Eastern Ontario (CHEO) in Ottawa and the Ottawa Hospital, both in Canada, and Washington University School of Medicine (WashU Medicine) in St. Louis, Missouri, have waived these requirements for research involving synthetic data.
The reasons the institutions use to justify this decision differ. However, the potential benefits of using synthetic data include protecting patient privacy, being more easily able to share data between sites and speeding up research, says Khaled El Emam, a medical AI researcher at the CHEO Research Institute and the University of Ottawa.
Washington University, which began waiving ethical review for such research in 2020, was “among the first US institutions to adopt synthetic data at scale” in medical science, says Philip Payne, who is the university’s vice-chancellor for biomedical informatics and data science, and director of its Institute for Informatics, Data Science and Biostatistics.
Payne says that synthetic data sets are not considered human-subject research under the 1991 US federal Common Rule, which governs ethical standards for research involving people. This, he adds, is because the data don’t contain any real or traceable patient information. WashU Medicine’s Institutional Review Board therefore doesn’t consider projects using such data sets to be ‘human subject research’ and doesn’t require them to be reviewed.
National variations
In Italy, scientists at the Humanitas AI Center have been exploring synthetic data in research since 2021, says Saverio D’Amico, the AI team leader. D’Amico and his colleagues can also avoid seeking consent from ethical review boards if they create data using information gathered from patients who have consented to data analysis for AI purposes, he says.
Humanitas has had more freedom to use synthetic data without ethical review than many other Italian organizations have, D’Amico says, because it is a high-level research hospital. The Italian Ministry of Health grants this status to a small number of institutes, marking them as benchmarks for innovation and quality patient care.
Meanwhile, in Ontario, the Personal Health Information Protection Act, 2004 says that the creation of non-personal information – which conceals individual identities – doesn’t require patient consent.
The Canadian hospitals decided to waive ethics-board review following legal analyses in 2024, says Cécile Bensimon, chair of the Research Ethics Board at CHEO. The analyses concluded that AI-generated synthetic data might not constitute personal health information. Therefore, like WashU Medicine, the CHEO board concluded that “the use of synthetic data in research does not require oversight by the hospital research ethics board because it does not meet the definition of human research”, Bensimon says.
However, studies in which researchers access patient data to create synthetic data sets do need ethics board approval, Bensimon adds. But because they are deemed low-risk, they usually meet the criteria to waive participant consent. The conditions the Ottawa Hospital applies differ slightly. “AI in research more broadly is not inherently problematic,” Bensimon adds. “It just requires the application of existing standards and safeguards.”