Curated by THEOUTPOST
On Tue, 22 Oct, 12:03 AM UTC
3 Sources
[1]
Making it easier to verify an AI model's responses
Despite their impressive capabilities, large language models are far from perfect. These artificial intelligence models sometimes "hallucinate" by generating incorrect or unsupported information in response to a query. Due to this hallucination problem, an LLM's responses are often verified by human fact-checkers, especially if a model is deployed in a high-stakes setting like health care or finance. However, validation processes typically require people to read through long documents cited by the model, a task so onerous and error-prone it may prevent some users from deploying generative AI models in the first place. To help human validators, MIT researchers created a user-friendly system that enables people to verify an LLM's responses much more quickly. With this tool, called SymGen, an LLM generates responses with citations that point directly to the place in a source document, such as a given cell in a database. Users hover over highlighted portions of its text response to see data the model used to generate that specific word or phrase. At the same time, the unhighlighted portions show users which phrases need additional attention to check and verify. "We give people the ability to selectively focus on parts of the text they need to be more worried about. In the end, SymGen can give people higher confidence in a model's responses because they can easily take a closer look to ensure that the information is verified," says Shannon Shen, an electrical engineering and computer science graduate student and co-lead author of a paper on SymGen. Through a user study, Shen and his collaborators found that SymGen sped up verification time by about 20 percent, compared to manual procedures. By making it faster and easier for humans to validate model outputs, SymGen could help people identify errors in LLMs deployed in a variety of real-world situations, from generating clinical notes to summarizing financial market reports. Shen is joined on the paper by co-lead author and fellow EECS graduate student Lucas Torroba Hennigen; EECS graduate student Aniruddha "Ani" Nrusimha; Bernhard Gapp, president of the Good Data Initiative; and senior authors David Sontag, a professor of EECS, a member of the MIT Jameel Clinic, and the leader of the Clinical Machine Learning Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Yoon Kim, an assistant professor of EECS and a member of CSAIL. The research was recently presented at the Conference on Language Modeling. Symbolic references To aid in validation, many LLMs are designed to generate citations, which point to external documents, along with their language-based responses so users can check them. However, these verification systems are usually designed as an afterthought, without considering the effort it takes for people to sift through numerous citations, Shen says. "Generative AI is intended to reduce the user's time to complete a task. If you need to spend hours reading through all these documents to verify the model is saying something reasonable, then it's less helpful to have the generations in practice," Shen says. The researchers approached the validation problem from the perspective of the humans who will do the work. A SymGen user first provides the LLM with data it can reference in its response, such as a table that contains statistics from a basketball game. Then, rather than immediately asking the model to complete a task, like generating a game summary from those data, the researchers perform an intermediate step. They prompt the model to generate its response in a symbolic form. With this prompt, every time the model wants to cite words in its response, it must write the specific cell from the data table that contains the information it is referencing. For instance, if the model wants to cite the phrase "Portland Trailblazers" in its response, it would replace that text with the cell name in the data table that contains those words. "Because we have this intermediate step that has the text in a symbolic format, we are able to have really fine-grained references. We can say, for every single span of text in the output, this is exactly where in the data it corresponds to," Torroba Hennigen says. SymGen then resolves each reference using a rule-based tool that copies the corresponding text from the data table into the model's response. "This way, we know it is a verbatim copy, so we know there will not be any errors in the part of the text that corresponds to the actual data variable," Shen adds. Streamlining validation The model can create symbolic responses because of how it is trained. Large language models are fed reams of data from the internet, and some data are recorded in "placeholder format" where codes replace actual values. When SymGen prompts the model to generate a symbolic response, it uses a similar structure. "We design the prompt in a specific way to draw on the LLM's capabilities," Shen adds. During a user study, the majority of participants said SymGen made it easier to verify LLM-generated text. They could validate the model's responses about 20 percent faster than if they used standard methods. However, SymGen is limited by the quality of the source data. The LLM could cite an incorrect variable, and a human verifier may be none-the-wiser. In addition, the user must have source data in a structured format, like a table, to feed into SymGen. Right now, the system only works with tabular data. Moving forward, the researchers are enhancing SymGen so it can handle arbitrary text and other forms of data. With that capability, it could help validate portions of AI-generated legal document summaries, for instance. They also plan to test SymGen with physicians to study how it could identify errors in AI-generated clinical summaries.
[2]
User-friendly system makes it easier to verify an AI model's responses
Despite their impressive capabilities, large language models are far from perfect. These artificial intelligence models sometimes "hallucinate" by generating incorrect or unsupported information in response to a query. Due to this hallucination problem, an LLM's responses are often verified by human fact-checkers, especially if a model is deployed in a high-stakes setting like health care or finance. However, validation processes typically require people to read through long documents cited by the model, a task so onerous and error-prone it may prevent some users from deploying generative AI models in the first place. To help human validators, MIT researchers created a user-friendly system that enables people to verify an LLM's responses much more quickly. With this tool, called SymGen, an LLM generates responses with citations that point directly to the place in a source document, such as a given cell in a database. Users hover over highlighted portions of its text response to see data the model used to generate that specific word or phrase. At the same time, the unhighlighted portions show users which phrases need additional attention to check and verify. "We give people the ability to selectively focus on parts of the text they need to be more worried about. In the end, SymGen can give people higher confidence in a model's responses because they can easily take a closer look to ensure that the information is verified," says Shannon Shen, an electrical engineering and computer science graduate student and co-lead author of a paper on SymGen, which is published on the arXiv preprint server. Through a user study, Shen and his collaborators found that SymGen sped up verification time by about 20%, compared to manual procedures. By making it faster and easier for humans to validate model outputs, SymGen could help people identify errors in LLMs deployed in a variety of real-world situations, from generating clinical notes to summarizing financial market reports. Shen is joined on the paper by co-lead author and fellow EECS graduate student Lucas Torroba Hennigen; EECS graduate student Aniruddha "Ani" Nrusimha; Bernhard Gapp, president of the Good Data Initiative; and senior authors David Sontag, a professor of EECS, a member of the MIT Jameel Clinic, and the leader of the Clinical Machine Learning Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Yoon Kim, an assistant professor of EECS and a member of CSAIL. The research was recently presented at the Conference on Language Modeling. Symbolic references To aid in validation, many LLMs are designed to generate citations, which point to external documents, along with their language-based responses so users can check them. However, these verification systems are usually designed as an afterthought, without considering the effort it takes for people to sift through numerous citations, Shen says. "Generative AI is intended to reduce the user's time to complete a task. If you need to spend hours reading through all these documents to verify the model is saying something reasonable, then it's less helpful to have the generations in practice," Shen says. The researchers approached the validation problem from the perspective of the humans who will do the work. A SymGen user first provides the LLM with data it can reference in its response, such as a table that contains statistics from a basketball game. Then, rather than immediately asking the model to complete a task, like generating a game summary from those data, the researchers perform an intermediate step. They prompt the model to generate its response in a symbolic form. With this prompt, every time the model wants to cite words in its response, it must write the specific cell from the data table that contains the information it is referencing. For instance, if the model wants to cite the phrase "Portland Trailblazers" in its response, it would replace that text with the cell name in the data table that contains those words. "Because we have this intermediate step that has the text in a symbolic format, we are able to have really fine-grained references. We can say, for every single span of text in the output, this is exactly where in the data it corresponds to," Hennigen says. SymGen then resolves each reference using a rule-based tool that copies the corresponding text from the data table into the model's response. "This way, we know it is a verbatim copy, so we know there will not be any errors in the part of the text that corresponds to the actual data variable," Shen adds. Streamlining validation The model can create symbolic responses because of how it is trained. Large language models are fed reams of data from the internet, and some data are recorded in "placeholder format" where codes replace actual values. When SymGen prompts the model to generate a symbolic response, it uses a similar structure. "We design the prompt in a specific way to draw on the LLM's capabilities," Shen adds. During a user study, the majority of participants said SymGen made it easier to verify LLM-generated text. They could validate the model's responses about 20% faster than if they used standard methods. However, SymGen is limited by the quality of the source data. The LLM could cite an incorrect variable, and a human verifier may be none-the-wiser. In addition, the user must have source data in a structured format, like a table, to feed into SymGen. Right now, the system only works with tabular data. Moving forward, the researchers are enhancing SymGen so it can handle arbitrary text and other forms of data. With that capability, it could help validate portions of AI-generated legal document summaries, for instance. They also plan to test SymGen with physicians to study how it could identify errors in AI-generated clinical summaries.
[3]
Making it easier to verify an AI model's responses
Caption: With SymGen, every time the model wants to cite words in its response, it must write the specific cell from the data table that contains the information it is referencing. Then SymGen resolves each reference using a rule-based tool that copies the corresponding text from the data table. Despite their impressive capabilities, large language models are far from perfect. These artificial intelligence models sometimes "hallucinate" by generating incorrect or unsupported information in response to a query. Due to this hallucination problem, an LLM's responses are often verified by human fact-checkers, especially if a model is deployed in a high-stakes setting like health care or finance. However, validation processes typically require people to read through long documents cited by the model, a task so onerous and error-prone it may prevent some users from deploying generative AI models in the first place. To help human validators, MIT researchers created a user-friendly system that enables people to verify an LLM's responses much more quickly. With this tool, called SymGen, an LLM generates responses with citations that point directly to the place in a source document, such as a given cell in a database. Users hover over highlighted portions of its text response to see data the model used to generate that specific word or phrase. At the same time, the unhighlighted portions show users which phrases need additional attention to check and verify. "We give people the ability to selectively focus on parts of the text they need to be more worried about. In the end, SymGen can give people higher confidence in a model's responses because they can easily take a closer look to ensure that the information is verified," says Shannon Shen, an electrical engineering and computer science graduate student and co-lead author of a paper on SymGen. Through a user study, Shen and his collaborators found that SymGen sped up verification time by about 20 percent, compared to manual procedures. By making it faster and easier for humans to validate model outputs, SymGen could help people identify errors in LLMs deployed in a variety of real-world situations, from generating clinical notes to summarizing financial market reports. Shen is joined on the paper by co-lead author and fellow EECS graduate student Lucas Torroba Hennigen; EECS graduate student Aniruddha "Ani" Nrusimha; Bernhard Gapp, president of the Good Data Initiative; and senior authors David Sontag, a professor of EECS, a member of the MIT Jameel Clinic, and the leader of the Clinical Machine Learning Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Yoon Kim, an assistant professor of EECS and a member of CSAIL. The research was recently presented at the Conference on Language Modeling. Symbolic references To aid in validation, many LLMs are designed to generate citations, which point to external documents, along with their language-based responses so users can check them. However, these verification systems are usually designed as an afterthought, without considering the effort it takes for people to sift through numerous citations, Shen says. "Generative AI is intended to reduce the user's time to complete a task. If you need to spend hours reading through all these documents to verify the model is saying something reasonable, then it's less helpful to have the generations in practice," Shen says. The researchers approached the validation problem from the perspective of the humans who will do the work. A SymGen user first provides the LLM with data it can reference in its response, such as a table that contains statistics from a basketball game. Then, rather than immediately asking the model to complete a task, like generating a game summary from those data, the researchers perform an intermediate step. They prompt the model to generate its response in a symbolic form. With this prompt, every time the model wants to cite words in its response, it must write the specific cell from the data table that contains the information it is referencing. For instance, if the model wants to cite the phrase "Portland Trailblazers" in its response, it would replace that text with the cell name in the data table that contains those words. "Because we have this intermediate step that has the text in a symbolic format, we are able to have really fine-grained references. We can say, for every single span of text in the output, this is exactly where in the data it corresponds to," Torroba Hennigen says. SymGen then resolves each reference using a rule-based tool that copies the corresponding text from the data table into the model's response. "This way, we know it is a verbatim copy, so we know there will not be any errors in the part of the text that corresponds to the actual data variable," Shen adds. Streamlining validation The model can create symbolic responses because of how it is trained. Large language models are fed reams of data from the internet, and some data are recorded in "placeholder format" where codes replace actual values. When SymGen prompts the model to generate a symbolic response, it uses a similar structure. "We design the prompt in a specific way to draw on the LLM's capabilities," Shen adds. During a user study, the majority of participants said SymGen made it easier to verify LLM-generated text. They could validate the model's responses about 20 percent faster than if they used standard methods. However, SymGen is limited by the quality of the source data. The LLM could cite an incorrect variable, and a human verifier may be none-the-wiser. In addition, the user must have source data in a structured format, like a table, to feed into SymGen. Right now, the system only works with tabular data. Moving forward, the researchers are enhancing SymGen so it can handle arbitrary text and other forms of data. With that capability, it could help validate portions of AI-generated legal document summaries, for instance. They also plan to test SymGen with physicians to study how it could identify errors in AI-generated clinical summaries. This work is funded, in part, by Liberty Mutual and the MIT Quest for Intelligence Initiative.
Share
Share
Copy Link
MIT researchers have created SymGen, a user-friendly system that makes it easier and faster for humans to verify the responses of large language models, potentially addressing the issue of AI hallucinations in high-stakes applications.
Researchers at the Massachusetts Institute of Technology (MIT) have developed a new tool called SymGen to address one of the most pressing challenges in artificial intelligence: the verification of responses generated by large language models (LLMs). This innovative system aims to streamline the process of fact-checking AI-generated content, potentially making it easier to deploy these models in critical sectors such as healthcare and finance 1.
LLMs, despite their impressive capabilities, are prone to "hallucinations" – instances where they generate incorrect or unsupported information. This issue has necessitated human fact-checking, especially in high-stakes environments. However, the current validation processes are often time-consuming and error-prone, involving the review of lengthy documents cited by the model 2.
SymGen takes a novel approach to this problem:
Symbolic References: The system prompts the LLM to generate responses in a symbolic form, where each piece of information is linked to a specific cell in a source data table 3.
Direct Citations: Instead of general references, SymGen creates citations that point directly to the exact location of information in the source document.
Interactive Verification: Users can hover over highlighted portions of the text to see the data used to generate specific words or phrases. Unhighlighted portions indicate areas that may require additional verification 1.
Rule-Based Resolution: The system uses a rule-based tool to copy the corresponding text from the data table into the model's response, ensuring verbatim accuracy for cited information 2.
In user studies, SymGen demonstrated significant improvements in the verification process:
However, the researchers acknowledge some limitations:
Moving forward, the MIT team plans to enhance SymGen to handle arbitrary text and other forms of data. They also aim to test the system with physicians to explore its potential in identifying errors in AI-generated clinical summaries 2.
By making it faster and easier for humans to validate model outputs, SymGen could potentially accelerate the responsible deployment of AI in various real-world scenarios. This includes applications in generating clinical notes, summarizing financial market reports, and even validating portions of AI-generated legal document summaries 1 3.
Reference
[1]
[3]
MIT CSAIL researchers have created ContextCite, a tool that identifies specific sources used by AI models to generate responses, improving content verification and trustworthiness.
2 Sources
2 Sources
MIT researchers have created a system called EXPLINGO that uses large language models to convert complex AI explanations into easily understandable narratives, aiming to bridge the gap between AI decision-making and human comprehension.
3 Sources
3 Sources
Computer scientists are working on innovative approaches to enhance the factual accuracy of AI-generated information, including confidence scoring systems and cross-referencing with reliable sources.
2 Sources
2 Sources
Google unveils DataGemma, an open-source AI model designed to reduce hallucinations in large language models when handling statistical queries. This innovation aims to improve the accuracy and reliability of AI-generated information.
3 Sources
3 Sources
Australian researchers develop LLM4SD, an AI tool that simulates scientists by analyzing research, generating hypotheses, and providing transparent explanations for predictions across various scientific disciplines.
2 Sources
2 Sources
The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved