Joel David Hamkins, a leading mathematician and logic professor at the University of Notre Dame, has fired a withering salvo at large language models in mathematical research. He refers to them as being fundamentally useless to support mathematical work, noting that they give “garbage answers that are not mathematically correct.”Speaking on the Lex Fridman podcast, Hamkins shared his frustration with current AI systems despite experimenting with various paid models. “I’ve played around with it and I’ve tried experimenting, but I haven’t found it helpful at all,” he stated bluntly.
AI’s confident incorrectness mirrors frustrating human interactions
What troubles Hamkins most isn’t the occasional mathematical slip so much as the way the AI systems react to being corrected. When he points out concrete flaws in their reasoning, the models answer with feisty assurances like “Oh, it’s totally fine.” This pattern—confident but incorrect responses coupled with resistance to correction—erodes the collaborative trust that underpins mathematical dialogue.“If I were having such an experience with a person, I would simply refuse to talk to that person again,” Hamkins explained, emphasizing that the AI’s conduct parallels counterproductive human interactions he would avoid.
Growing gap between AI benchmarks and real-world research applications
Hamkins’ critique comes as the mathematical community has a mixed response to what AI is capable of. Some researchers have announced breakthroughs with the help of AI tackling problems from the Erdos collection; others, like the mathematician Terrence Tao, caution that AI produces perfect-looking proofs with subtle mistakes no human reviewer would allow. The assessment explicitly states a critical tension: impressive performance on standardized tests doesn’t translate to practical utility for domain experts.The assessment reveals a critical tension: impressive performance on standardized tests doesn’t translate to practical utility for domain experts. “As far as mathematical reasoning is concerned, it seems not reliable,” Hamkins concluded.Though he acknowledges future AI systems might improve, Hamkins remains skeptical about current capabilities. His experience serves as a sobering reminder that AI companies’ heavy investments in reasoning capabilities haven’t yet bridged the gap between benchmark performance and serving as genuine research partners for working mathematicians.