mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

9.1K
active users

#chatbots

56 posts49 participants1 post today

When LLMs suffer from Digital Alzheimer...:

"Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions, LLM evaluation has predominantly focused on the single-turn, fully-specified instruction setting. In this work, we perform large-scale simulation experiments to compare LLM performance in single- and multi-turn settings. Our experiments confirm that all the top open- and closed-weight LLMs we test exhibit significantly lower performance in multi-turn conversations than single-turn, with an average drop of 39% across six generation tasks. Analysis of 200,000+ simulated conversations decomposes the performance degradation into two components: a minor loss in aptitude and a significant increase in unreliability. We find that LLMs often make assumptions in early turns and prematurely attempt to generate final solutions, on which they overly rely. In simpler terms, we discover that when LLMs take a wrong turn in a conversation, they get lost and do not recover."

arxiv.org/abs/2505.06120

arXiv logo
arXiv.orgLLMs Get Lost In Multi-Turn ConversationLarge Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions, LLM evaluation has predominantly focused on the single-turn, fully-specified instruction setting. In this work, we perform large-scale simulation experiments to compare LLM performance in single- and multi-turn settings. Our experiments confirm that all the top open- and closed-weight LLMs we test exhibit significantly lower performance in multi-turn conversations than single-turn, with an average drop of 39% across six generation tasks. Analysis of 200,000+ simulated conversations decomposes the performance degradation into two components: a minor loss in aptitude and a significant increase in unreliability. We find that LLMs often make assumptions in early turns and prematurely attempt to generate final solutions, on which they overly rely. In simpler terms, we discover that *when LLMs take a wrong turn in a conversation, they get lost and do not recover*.

"When ChatGPT was released at the end of 2022, it caused a panic at all levels of education because it made cheating incredibly easy. Students who were asked to write a history paper or literary analysis could have the tool do it in mere seconds. Some schools banned it while others deployed A.I. detection services, despite concerns about their accuracy.

But, oh, how the tables have turned. Now students are complaining on sites like Rate My Professors about their instructors’ overreliance on A.I. and scrutinizing course materials for words ChatGPT tends to overuse, like “crucial” and “delve.” In addition to calling out hypocrisy, they make a financial argument: They are paying, often quite a lot, to be taught by humans, not an algorithm that they, too, could consult for free.

For their part, professors said they used A.I. chatbots as a tool to provide a better education. Instructors interviewed by The New York Times said chatbots saved time, helped them with overwhelming workloads and served as automated teaching assistants.

Their numbers are growing. In a national survey of more than 1,800 higher-education instructors last year, 18 percent described themselves as frequent users of generative A.I. tools; in a repeat survey this year, that percentage nearly doubled, according to Tyton Partners, the consulting group that conducted the research. The A.I. industry wants to help, and to profit: The start-ups OpenAI and Anthropic recently created enterprise versions of their chatbots designed for universities."

nytimes.com/2025/05/14/technol

Ella Stapleton said she was surprised to find that a professor had used ChatGPT to assemble course materials. “He’s telling us not to use it, and then he’s using it himself,” she said.
The New York Times · College Professors Are Using ChatGPT. Some Students Aren’t Happy.By Kashmir Hill

Financial Times: Insurers launch cover for losses caused by AI chatbot errors. “Insurers at Lloyd’s of London have launched a product to cover companies for losses caused by malfunctioning artificial intelligence tools, as the sector aims to profit from concerns about the risk of costly hallucinations and errors by chatbots.”

https://rbfirehose.com/2025/05/15/financial-times-insurers-launch-cover-for-losses-caused-by-ai-chatbot-errors/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Financial Times: Insurers launch cover for losses caused by AI chatbot errors | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

Mashable: More concise chatbot responses tied to increase in hallucinations, study finds. “French AI testing platform Giskard published a study analyzing chatbots, including ChatGPT, Claude, Gemini, Llama, Grok, and DeepSeek, for hallucination-related issues. In its findings, the researchers discovered that asking the models to be brief in their responses ‘specifically degraded factual […]

https://rbfirehose.com/2025/05/15/mashable-more-concise-chatbot-responses-tied-to-increase-in-hallucinations-study-finds/

ResearchBuzz: Firehose | Individual posts from ResearchBuzz · Mashable: More concise chatbot responses tied to increase in hallucinations, study finds | ResearchBuzz: Firehose
More from ResearchBuzz: Firehose

Here it is.

To date no interactions with any kind of 'help' system 'chatbot' LLM faux AI or similar have been of any help at all to me.

In ALL cases I have only been 'helped' after finally communicating with a human.

Grrrrrrr.... No exaggeration.