The Outpost is a comprehensive collection of curated artificial intelligence software tools that cater to the needs of small business owners, bloggers, artists, musicians, entrepreneurs, marketers, writers, and researchers.
© 2025 TheOutpost.AI All rights reserved
Curated by THEOUTPOST
On Tue, 12 Nov, 4:01 PM UTC
2 Sources
[1]
AI is universally bad at knowing when to chime in during a conversation: Researchers discover some of the root causes
When you have a conversation today, notice the natural points when the exchange leaves open the opportunity for the other person to chime in. If their timing is off, they might be taken as overly aggressive, too timid, or just plain awkward. The back-and-forth is the social element to the exchange of information that occurs in a conversation, and while humans do this naturally -- with some exceptions -- AI language systems are universally bad at it. Linguistics and computer science researchers at Tufts have now discovered some of the root causes of this shortfall in AI conversational skills and point to possible ways to make them better conversational partners. Their study results will be presented at the Empirical Methods in Natural Language Processing (EMNLP 2024) conference, held in Miami from November 12 to 16, and are posted to the arXiv preprint server. When humans interact verbally, for the most part, they avoid speaking simultaneously, taking turns to speak and listen. Each person evaluates many input cues to determine what linguists call "transition relevant places" or TRPs. TRPs occur often in conversations. Many times we will take a pass at a TRP, and let the speaker continue. Other times we will use the TRP to take our turn and share our thoughts. JP de Ruiter, professor of psychology and computer science, says that for a long time it was thought that the "paraverbal" information in conversations -- the intonations, lengthening of words and phrases, pauses, and some visual cues -- were the most important signals for identifying a TRP. "That helps a little bit," says de Ruiter, "but if you take out the words and just give people the prosody -- the melody and rhythm of speech that comes through as if you were talking through a sock -- they can no longer detect appropriate TRPs." Do the reverse and just provide the linguistic content in a monotone speech, and study subjects will find most of the same TRPs they would find in natural speech. "What we now know is that the most important cue for taking turns in conversation is the language content itself. The pauses and other cues don't matter that much," says de Ruiter. AI is great at detecting patterns in content, but when de Ruiter, graduate student Muhammad Umair, and research assistant professor of computer science Vasanth Sarathy, EG20, tested transcribed conversations against a large language model AI, the AI was not able to detect appropriate TRPs with anywhere near the capability of humans. The reason stems from what the AI is trained on. Large language models, including the most advanced ones such as ChatGPT, have been trained on a vast dataset of written content from the internet -- Wikipedia entries, online discussion groups, company websites, news sites -- just about everything. What is missing from that dataset is any significant amount of transcribed spoken conversational language, which is unscripted, uses simpler vocabulary and shorter sentences, and is structured differently than written language. AI was not "raised" on conversation, so it does not have the ability to model or engage in conversation in a more natural, human-like manner. The researchers thought that it might be possible to take a large language model trained on written content and fine-tune it with additional training on a smaller set of conversational content so it can engage more naturally in a novel conversation. When they tried this, they found that there were still some limitations to replicating human-like conversation. The researchers caution that there may be a fundamental barrier to AI carrying on a natural conversation. "We are assuming that these large language models can understand the content correctly. That may not be the case," says Sarathy. "They're predicting the next word based on superficial statistical correlations, but turn taking involves drawing from context much deeper into the conversation." "It's possible that the limitations can be overcome by pre-training large language models on a larger body of naturally occurring spoken language," says Umair, whose Ph.D. research focuses on human-robot interactions and who is the lead author on the studies. "Although we have released a novel training dataset that helps AI identify opportunities for speech in naturally occurring dialogue, collecting such data at a scale required to train today's AI models remains a significant challenge," he says. "There are just not nearly as many conversational recordings and transcripts available compared to written content on the internet."
[2]
AI Needs to Work on Its Conversation Game | Newswise
Artificial intelligence, trained on a vast dataset of written text, does a poor job in the back-and-forth of human-like conversation. The solution may be to train large language models more substantially on transcribed spoken conversations When you have a conversation today, notice the natural points when the exchange leaves open the opportunity for the other person to chime in. If their timing is off, they might be taken as overly aggressive, too timid, or just plain awkward. The back-and-forth is the social element to the exchange of information that occurs in a conversation, and while humans do this naturally -- with some exceptions -- AI language systems are universally bad at it. Linguistics and computer science researchers at Tufts University have now discovered some of the root causes of this shortfall in AI conversational skills and point to possible ways to make them better conversational partners. When humans interact verbally, for the most part they avoid speaking simultaneously, taking turns to speak and listen. Each person evaluates many input cues to determine what linguists call "transition relevant places" or TRPs. TRPs occur often in a conversation. Many times we will take a pass and let the speaker continue. Other times we will use the TRP to take our turn and share our thoughts. JP de Ruiter, professor of psychology and computer science, says that for a long time it was thought that the "paraverbal" information in conversations -- the intonations, lengthening of words and phrases, pauses, and some visual cues -- were the most important signals for identifying a TRP. "That helps a little bit," says de Ruiter, "but if you take out the words and just give people the prosody -- the melody and rhythm of speech that comes through as if you were talking through a sock -- they can no longer detect appropriate TRPs." Do the reverse and just provide the linguistic content in a monotone speech, and study subjects will find most of the same TRPs they would find in natural speech. "What we now know is that the most important cue for taking turns in conversation is the language content itself. The pauses and other cues don't matter that much," says de Ruiter. AI is great at detecting patterns in content, but when de Ruiter, graduate student Muhammad Umair, and research assistant professor of computer science Vasanth Sarathy tested transcribed conversations against a large language model AI, the AI was not able to detect appropriate TRPs anywhere near the capability of humans. The reason stems from what the AI is trained on. Large language models, including the most advanced ones such as ChatGPT, have been trained on a vast dataset of written content from the internet -- Wikipedia entries, online discussion groups, company websites, news sites -- just about everything. What is missing from that dataset is any significant amount of transcribed spoken conversational language, which is unscripted, uses simpler vocabulary and shorter sentences, and is structured differently than written language. AI was not "raised" on conversation, so it does not have the ability to model or engage in conversation in a more natural, human-like manner. The researchers thought that it might be possible to take a large language model trained on written content and fine-tune it with additional training on a smaller set of conversational content so it can engage more naturally in a novel conversation. When they tried this, they found that there were still some limitations to replicating human-like conversation. The researchers caution that there may be a fundamental barrier to AI carrying on a natural conversation. "We are assuming that these large language models can understand the content correctly. That may not be the case," said Sarathy. "They're predicting the next word based on superficial statistical correlations, but turn taking involves drawing from context much deeper into the conversation." "It's possible that the limitations can be overcome by pre-training large language models on a larger body of naturally occurring spoken language," said Umair, whose PhD research focuses on human-robot interactions and is the lead author on the studies. "Although we have released a novel training dataset that helps AI identify opportunities for speech in naturally occurring dialogue, collecting such data at a scale required to train today's AI models remains a significant challenge. There is just not nearly as much conversational recordings and transcripts available compared to written content on the internet."
Share
Share
Copy Link
Researchers at Tufts University have discovered that AI language models are universally poor at identifying appropriate moments to contribute in conversations, highlighting a significant gap in AI's ability to engage in natural dialogue.
Researchers at Tufts University have made a significant discovery in the field of artificial intelligence, uncovering the reasons behind AI's universal struggle with natural conversation dynamics. The study, set to be presented at the Empirical Methods in Natural Language Processing (EMNLP 2024) conference, reveals that AI language systems are particularly poor at identifying appropriate moments to contribute during conversations 1.
Human conversations rely heavily on the ability to recognize "transition relevant places" (TRPs), which are natural points in dialogue where speakers can exchange turns. This skill, which most humans perform intuitively, involves evaluating various cues to determine when it's appropriate to speak or continue listening 2.
Contrary to previous beliefs, the study found that the linguistic content of speech is far more critical in identifying TRPs than paraverbal information such as intonation, pauses, or visual cues. Professor JP de Ruiter of Tufts University explained, "What we now know is that the most important cue for taking turns in conversation is the language content itself. The pauses and other cues don't matter that much" 1.
The research team, including de Ruiter, graduate student Muhammad Umair, and research assistant professor Vasanth Sarathy, discovered that the root of AI's conversational inadequacy lies in its training data. Large language models like ChatGPT are primarily trained on written internet content, which differs significantly from spoken language in structure, vocabulary, and sentence complexity 2.
The researchers attempted to fine-tune a large language model with a smaller set of conversational content to improve its performance. However, this approach yielded limited success in replicating human-like conversation abilities 1.
The study suggests there may be fundamental barriers to AI achieving natural conversation skills. Sarathy noted, "We are assuming that these large language models can understand the content correctly. That may not be the case. They're predicting the next word based on superficial statistical correlations, but turn-taking involves drawing from context much deeper into the conversation" 2.
Umair, the lead author, proposed a potential solution: "It's possible that the limitations can be overcome by pre-training large language models on a larger body of naturally occurring spoken language." However, he acknowledged the significant challenge in collecting sufficient conversational data to train modern AI models effectively 1.
This research highlights a critical area for improvement in AI language models. As AI continues to integrate into various aspects of communication and customer service, addressing these conversational shortcomings becomes increasingly important for creating more natural and effective human-AI interactions 2.
A new study from Johns Hopkins University shows that current AI models struggle to interpret social dynamics and context in video clips, highlighting a significant gap between human and machine perception of social interactions.
4 Sources
4 Sources
A new study reveals that while AI models perform well on standardized medical tests, they face significant challenges in simulating real-world doctor-patient conversations, raising concerns about their readiness for clinical deployment.
3 Sources
3 Sources
Recent research reveals that while larger AI language models demonstrate enhanced capabilities in answering questions, they also exhibit a concerning trend of increased confidence in incorrect responses. This phenomenon raises important questions about the development and deployment of advanced AI systems.
5 Sources
5 Sources
Recent reports suggest that the rapid advancements in AI, particularly in large language models, may be hitting a plateau. Industry insiders and experts are noting diminishing returns despite massive investments in computing power and data.
14 Sources
14 Sources
An analysis of AI's future through the lens of Google Translate's successes and shortcomings, highlighting the challenges faced by Large Language Models and their implications for various industries.
2 Sources
2 Sources