3 Sources
3 Sources
[1]
3 Questions: Should we label AI systems like we do prescription drugs?
Caption: The labels "can help to ensure that users are aware of 'potential side effects,' any 'warnings and precautions,' and 'adverse reactions,'" says Marzyeh Ghassemi. AI systems are increasingly being deployed in safety-critical health care situations. Yet these models sometimes hallucinate incorrect information, make biased predictions, or fail for unexpected reasons, which could have serious consequences for patients and clinicians. In a commentary article published today in Nature Computational Science, MIT Associate Professor Marzyeh Ghassemi and Boston University Associate Professor Elaine Nsoesie argue that, to mitigate these potential harms, AI systems should be accompanied by responsible-use labels, similar to U.S. Food and Drug Administration-mandated labels placed on prescription medications. MIT News spoke with Ghassemi about the need for such labels, the information they should convey, and how labeling procedures could be implemented. Q: Why do we need responsible use labels for AI systems in health care settings? A: In a health setting, we have an interesting situation where doctors often rely on technology or treatments that are not fully understood. Sometimes this lack of understanding is fundamental -- the mechanism behind acetaminophen for instance -- but other times this is just a limit of specialization. We don't expect clinicians to know how to service an MRI machine, for instance. Instead, we have certification systems through the FDA or other federal agencies, that certify the use of a medical device or drug in a specific setting. Importantly, medical devices also have service contracts -- a technician from the manufacturer will fix your MRI machine if it is miscalibrated. For approved drugs, there are postmarket surveillance and reporting systems so that adverse effects or events can be addressed, for instance if a lot of people taking a drug seem to be developing a condition or allergy. Models and algorithms, whether they incorporate AI or not, skirt a lot of these approval and long-term monitoring processes, and that is something we need to be wary of. Many prior studies have shown that predictive models need more careful evaluation and monitoring. With more recent generative AI specifically, we cite work that has demonstrated generation is not guaranteed to be appropriate, robust, or unbiased. Because we don't have the same level of surveillance on model predictions or generation, it would be even more difficult to catch a model's problematic responses. The generative models being used by hospitals right now could be biased. Having use labels is one way of ensuring that models don't automate biases that are learned from human practitioners or miscalibrated clinical decision support scores of the past. Q: Your article describes several components of a responsible use label for AI, following the FDA approach for creating prescription labels, including approved usage, ingredients, potential side effects, etc. What core information should these labels convey? A: The things a label should make obvious are time, place, and manner of a model's intended use. For instance, the user should know that models were trained at a specific time with data from a specific time point. For instance, does it include data that did or did not include the Covid-19 pandemic? There were very different health practices during Covid that could impact the data. This is why we advocate for the model "ingredients" and "completed studies" to be disclosed. For place, we know from prior research that models trained in one location tend to have worse performance when moved to another location. Knowing where the data were from and how a model was optimized within that population can help to ensure that users are aware of "potential side effects," any "warnings and precautions," and "adverse reactions." With a model trained to predict one outcome, knowing the time and place of training could help you make intelligent judgements about deployment. But many generative models are incredibly flexible and can be used for many tasks. Here, time and place may not be as informative, and more explicit direction about "conditions of labeling" and "approved usage" versus "unapproved usage" come into play. If a developer has evaluated a generative model for reading a patient's clinical notes and generating prospective billing codes, they can disclose that it has bias toward overbilling for specific conditions or underrecognizing others. A user wouldn't want to use this same generative model to decide who gets a referral to a specialist, even though they could. This flexibility is why we advocate for additional details on the manner in which models should be used. In general, we advocate that you should train the best model you can, using the tools available to you. But even then, there should be a lot of disclosure. No model is going to be perfect. As a society, we now understand that no pill is perfect -- there is always some risk. We should have the same understanding of AI models. Any model -- with or without AI -- is limited. It may be giving you realistic, well-trained, forecasts of potential futures, but take that with whatever grain of salt is appropriate. Q: If AI labels were to be implemented, who would do the labeling and how would labels be regulated and enforced? A: If you don't intend for your model to be used in practice, then the disclosures you would make for a high-quality research publication are sufficient. But once you intend your model to be deployed in a human-facing setting, developers and deployers should do an initial labeling, based on some of the established frameworks. There should be a validation of these claims prior to deployment; in a safety-critical setting like health care, many agencies of the Department of Health and Human Services could be involved. For model developers, I think that knowing you will need to label the limitations of a system induces more careful consideration of the process itself. If I know that at some point I am going to have to disclose the population upon which a model was trained, I would not want to disclose that it was trained only on dialogue from male chatbot users, for instance. Thinking about things like who the data are collected on, over what time period, what the sample size was, and how you decided what data to include or exclude, can open your mind up to potential problems at deployment.
[2]
Q&A: Should we label AI systems like we do prescription drugs?
AI systems are increasingly being deployed in safety-critical health care situations. Yet these models sometimes hallucinate incorrect information, make biased predictions, or fail for unexpected reasons, which could have serious consequences for patients and clinicians. In a commentary article published today in Nature Computational Science, MIT Associate Professor Marzyeh Ghassemi and Boston University Associate Professor Elaine Nsoesie argue that to mitigate these potential harms, AI systems should be accompanied by responsible-use labels, similar to U.S. Food and Drug Administration-mandated labels that are placed on prescription medications. MIT News spoke with Ghassemi about the need for such labels, the information they should convey, and how labeling procedures could be implemented. In a health setting, we have an interesting situation where doctors often rely on technology or treatments that are not fully understood. Sometimes this lack of understanding is fundamental -- the mechanism behind acetaminophen, for instance -- but other times this is just a limit of specialization. We don't expect clinicians to know how to service an MRI machine, for instance. Instead, we have certification systems through the FDA or other federal agencies, that certify the use of a medical device or drug in a specific setting. Importantly, medical devices also have service contracts -- a technician from the manufacturer will fix your MRI machine if it is miscalibrated. For approved drugs, there are postmarket surveillance and reporting systems so that adverse effects or events can be addressed, for instance, if a lot of people taking a drug seem to be developing a condition or allergy. Models and algorithms, whether they incorporate AI or not, skirt a lot of these approval and long-term monitoring processes, and that is something we need to be wary of. Many prior studies have shown that predictive models need more careful evaluation and monitoring. With more recent generative AI specifically, we cite work that has demonstrated generation is not guaranteed to be appropriate, robust, or unbiased. Because we don't have the same level of surveillance on model predictions or generation, it would be even more difficult to catch a model's problematic responses. The generative models being used by hospitals right now could be biased. Having use labels is one way of ensuring that models don't automate biases that are learned from human practitioners or miscalibrated clinical decision support scores of the past. The things a label should make obvious are time, place, and manner of a model's intended use. For instance, the user should know that models were trained at a specific time with data from a specific time point. For instance, does it include data that did or did not include the COVID-19 pandemic? There were very different health practices during COVID that could impact the data. This is why we advocate for the model "ingredients" and "completed studies" to be disclosed. For place, we know from prior research that models trained in one location tend to have worse performance when moved to another location. Knowing where the data was from and how a model was optimized within that population can help to ensure that users are aware of "potential side effects," any "warnings and precautions," and "adverse reactions." With a model trained to predict one outcome, knowing the time and place of training could help you make intelligent judgments about deployment. But many generative models are incredibly flexible and can be used for many tasks. Here, time and place may not be as informative, and more explicit directions about "conditions of labeling" and "approved usage" versus "unapproved usage" come into play. If a developer has evaluated a generative model for reading a patient's clinical notes and generating prospective billing codes, they can disclose that it has bias toward overbilling for specific conditions or underrecognizing others. A user wouldn't want to use this same generative model to decide who gets a referral to a specialist, even though they could. This flexibility is why we advocate for additional details on the manner in which models should be used. In general, we advocate that you should train the best model you can, using the tools available to you. But even then, there should be a lot of disclosure. No model is going to be perfect. As a society, we now understand that no pill is perfect -- there is always some risk. We should have the same understanding of AI models. Any model -- with or without AI -- is limited. It may be giving you realistic, well-trained, forecasts of potential futures, but take that with whatever grain of salt is appropriate. If you don't intend for your model to be used in practice, then the disclosures you would make for a high-quality research publication are sufficient. But once you intend your model to be deployed in a human-facing setting, developers and deployers should do an initial labeling, based on some of the established frameworks. There should be a validation of these claims prior to deployment; in a safety-critical setting like health care, many agencies of the Department of Health and Human Services could be involved. For model developers, I think that knowing you will need to label the limitations of a system induces more careful consideration of the process itself. If I know that at some point I am going to have to disclose the population upon which a model was trained, I would not want to disclose that it was trained only on dialogue from male chatbot users, for instance. Thinking about things like who the data is collected on, over what time period, what the sample size was, and how you decided what data to include or exclude can open your mind up to potential problems at deployment.
[3]
Using labels to limit AI misuse in health - Nature Computational Science
Furthermore, creating such a label would force AI developers to be more critical in assessing the ethical implications of the algorithms they develop and release to the general public. While there is more public demand for accountability, the teaching of ethical AI, specifically, approaches to redress the impacts of data and algorithm bias is not always a priority in computer science education or conferences. Instead, the focus weighs heavily towards advances in algorithms. In the last several years, major AI companies in the US have been in the news for disbanding their ethical units. The development of AI labels would not be successful without diverse developer teams that include social scientists, ethicists and in the case of healthcare AI, clinicians. Developers should clearly communicate the approved uses and potential side effects of an algorithm to challenge those adopting these systems for healthcare applications to conduct a 'thought experiment' to determine potential impacts prior to use. To adopt the FDA approach to creating prescribing labels, the usage label should include the following information: approved usage, potential side effects, warnings and precautions, use in specific populations, adverse reactions, unapproved usage, completed studies, and ingredients of the algorithm. The responsible use label does not preclude the adherence to ethical principles such as, the GREAT PLEA -- Governability, Reliability, Equity, Accountability, Traceability, Privacy, Lawfulness, Empathy, and Autonomy -- which are necessary for the development, implementation and use of generative AI algorithms in healthcare settings. Rather, it adds to existing ethical AI principles by addressing the need for prescription-like information to support the effective and equitable application of general-use generative AI algorithms in healthcare settings, especially when the algorithms were not created for healthcare use. A succinct description of what the AI developers created the algorithm to do that includes descriptions of known use cases, and how adopters can use the algorithm to accomplish the specified use cases. For example, if the AI model was created to summarize information on the Internet then it should say so. Clearly communicates potential issues that might be encountered in the usage of the AI algorithm (for instance hallucinations or misrepresentation of historical data). Supplies information on the most serious ethical and equity issues that might arise from the use of the AI system. It also provides recommendations on how to identify and prevent such adverse reactions. Conveys information on questions that adopters should ask or consider before the algorithm is applied to solve a problem in a clinical setting. For example, what are the implications of adopting the algorithm for a specific clinical use case? How does adoption impact healthcare delivery for different racial, ethnic, gender, or other marginalized groups? Includes information on the intended application of the algorithm in specific populations. In a clinical setting, this could mean limiting applications to specific diseases, or conditions. Given known biases in the healthcare system, this should also describe how the algorithm addresses issues of representation and contexts for diverse populations (such as use in populations with language differences). Identifies undesirable or unintended effects associated with the adoption of the AI in a clinical setting, specifically as it relates to healthcare workers and patients. This section should describe the most common and most frequent reported adverse reactions that require interruption or discontinued use of the algorithm. For example, an unintended effect could involve an AI algorithm that increases clinicians' workload when it was intended to improve efficiency. Conveys information about specific cases where the algorithm should not be used and the potential impacts if used. For example, an AI algorithm developed to summarize texts could be used to create a hospital discharge summary, which implies it could capture clinician bias present in clinical notes. Developers probably can't think of all unapproved uses, so this section should be updated as more research is conducted. References to any scientific studies and findings that support the recommended use cases, adverse reactions, potential side effects and unapproved usage. An example is peer reviewed research demonstrating the applications of the AI system. Describes datasets used in training the algorithms, including known ethical issues associated with the data. The dataset description should include the elements mentioned in the "Data Set Nutrition Label" proposed by Holland and colleagues, which includes information on the data source, metadata and variables. Also, known ethical issues could include underrepresentation or complete absence of specific populations in the data. For example, a model trained to predict acute kidney injury in a population of 703,782 US veterans that was 94% male performed worse in predicting acute kidney injury in females in both the Veterans Affairs data and a sex-balanced academic hospital cohort. For an AI algorithm to meet the usage label criteria outlined above, it must meet the following conditions. First, the developer has conducted a careful assessment of the benefits and potential risks for the use case. Second, the algorithm does not pose substantial risk to exacerbating health inequities for the current use case. Third, rigorous research and validation has been conducted by the developer to ensure it will be useful for the particular application. Fourth, there are clear guidelines on how the public or experts can 'safely and effectively' use the algorithm. Developing these guidelines will require input from social scientists, ethicists and clinicians.
Share
Share
Copy Link
MIT researchers suggest implementing a labeling system for AI models, similar to prescription drug labels. This approach aims to increase transparency and help users understand the capabilities and limitations of AI systems.

In a groundbreaking proposal, researchers from the Massachusetts Institute of Technology (MIT) have suggested implementing a labeling system for artificial intelligence (AI) models, drawing parallels to the labeling practices used for prescription drugs. This initiative aims to enhance transparency and user understanding of AI systems' capabilities and limitations
1
.As AI systems become increasingly prevalent in various sectors, including healthcare, finance, and education, the need for clear communication about their functionalities and potential risks has never been more critical. The proposed labeling system would provide users with essential information about an AI model's intended use, performance metrics, and potential side effects
2
.The researchers suggest that AI labels should include:
This comprehensive approach would enable users to make informed decisions about when and how to utilize AI systems in their respective fields
1
.While the concept of AI labeling shows promise, experts acknowledge several challenges in its implementation. These include:
Overcoming these hurdles will require collaboration between researchers, industry leaders, and regulatory bodies
3
.Related Stories
The introduction of a standardized labeling system could have far-reaching effects on the AI industry. Proponents argue that it would:
By providing clear, accessible information about AI systems, this initiative could accelerate the adoption of AI technologies across various sectors while mitigating potential risks
2
.As discussions around AI labeling gain momentum, researchers and policymakers are exploring ways to turn this concept into reality. The MIT team emphasizes the need for ongoing research and collaboration to refine the labeling framework and address emerging challenges in the rapidly evolving field of artificial intelligence
3
.With the potential to reshape how we interact with and understand AI systems, the proposed labeling initiative represents a significant step towards more transparent and responsible AI development and deployment.
Summarized by
Navi
[1]
Massachusetts Institute of Technology
|[2]
1
Business and Economy

2
Business and Economy

3
Business and Economy
