I've always said this but AI will win a fields medal before being able to manage a McDonald's.
Math seems difficult to us because it's like using a hammer (the brain) to twist in a screw (math).
LLMs are discovering a lot of new math because they are great at low depth high breadth situations.
I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.
Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG
This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.
Some DeepMind researchers used mechanistic interpretability techniques to find concepts in AlphaZero and teach them to human chess Grandmasters: https://www.pnas.org/doi/10.1073/pnas.2406675122
Why must it involve understanding? I feel like you’re operating under the assumption that functionalism is the “correct” philosophical framework without considering alternative views.
Even that is probably too much. It has no understanding of what "chess" is, or what a chess board is, or even what a game is. And yet it crushes every human with ease. It's pretty nuts haha.
Actually, the neural net itself is fairly imprecise. Search is required for it to achieve good play. Here's an example of me beating Stockfish 18 at depth 1: https://lichess.org/XmITiqmi
As a professional mathematician, I would say that a good proof requires a very good representation of the problem, and then pulling out the tricks. The latter part is easy to get operating using LLMs, they can do it already. It's the former part that still needs humans, and I'm perfectly fine with that.
> I've always said this but AI will win a fields medal before being able to manage a McDonald's.
I love this and have a corollary saying: the last job to be automated will be QA.
This wave of technology has triggered more discussion about the types of knowledge work that exist than any other, and I think we will be sharper for it.
The ownership class will be sharper. They will know how to exploit capital and turn it into more capital with vastly increased efficiency. Everybody else will be hosed.
Are they actually producing new math? In the most recent ACM issue there was an article about testing AI against a math bench that was privately built by mathematicians, and what they found is that even though AI can solve some problems, it never truly has come up with something novel and new in mathematics, it is just good at drawing connections between existing research and putting a spin on it.
It's finding constructions and counterexamples. That's different from finding new proof techniques, but still extremely useful, and still gives way to novel findings.
I think this is mostly about existing legislature, not about technology.
In any other context than when your paycheck depends on it, you would probably not be following orders from a random manager. If your paycheck depended on following the instructions of an AI robot, the world might start to look pretty scary real soon.
AI actually has to follow all rules, even the bad rules. Like when autonomous car drives super carefully.
Imagine mcdonald management would enforce dog related rules. No more filthy muppets! If dog harasses customers, AI would call cops, and sue for restraining order! If dog defecates in middle of restaurant, everything would get desinfected, not just smeared with towels!
I have no idea how you come to this conclusion, when the evidence on the ground for those training models suggests it is precisely the opposite.
We are much further along the path of writing code than writing new maths, since the latter often requires some degree of representational fluency of the world we live in to be relevant. For example, proving something about braid groups can require representation by grid diagrams, and we know from ARC-AGI that LLMs don't do great with this.
Programming does not have this issue to the same extent; arguably, it involves the subset of maths that is exclusively problem solving using standard representations. The issues with programming are primarily on the difficulty with handling large volumes of text reliably.
Yeah, it's hard to compare management and programming but they're both multimodal in very different ways. But there's gonna be entire domains in which AI dominates much like stockfish, but stockfish isn't managing franchises and there is no reason to expect that anytime soon.
I feel like something people miss when they talk about intelligence is that humans have incredible breadth. This is really what differentiates us from artificial forms of intelligence as well as other animals. Plus we have agency, the ability to learn, the ability to critically think, from first principles, etc.
But LLMs have proven themselves better at programming than most professional programmers.
Don't argue. If you think Hackernews is a representative sample of the field then you haven't been in the field long enough.
What LLMs have actually done is put the dream of software engineering within reach. Creativity is inimical to software engineering; the goal has long been to provide a universal set of reusable components which can then be adapted and integrated into any system. The hard part was always providing libraries of such components, and then integrating them. LLMs have largely solved these problems. Their training data contains vast amounts of solved programming problems, and they are able to adapt these in vector space to whatever the situation calls for.
We are already there. Software engineering as it was long envisioned is now possible. And if you're not doing it with LLMs, you're going to be left behind. Multimodal human-level thinking need only be undertaken at the highest levels: deciding what to build and maybe choosing the components to build it. LLMs will take care of the rest.
A bit optimistic I'd say. It's put some software engineering within reach of some people who couldn't do it prior. Where 'some' might be a lot, but still far from all.
I was thinking the other day of how things would go if some of my less tech savvy clients tried to vibe code the things I implement for them, and frankly I could only imagine hilarity ensuing. They wouldn't be able to steer it correctly at all and would inevitably get stuck.
Someone needs to experiment with that actually: putting the full set of agentic coding tools in the hands of grandma and recording the outcome.
It's still going to take a knowledgeable person to steer an LLM. The point is that code written entirely by humans is finished as a concept in professional work—if you're writing it yourself you're not working efficiently or employing industry best practice.
When I was younger I remember a point of demarcation for me was learning the 4chan adage “trolls trolling trolls”, and approaching all internet interactions with skepticism. While I have been sure that Reddit for a while has succumbed to being “dead internet”. This thread is another moment for me- I can no longer recognize who is a bot, and who has honest intentions.
Like so many things -- the evolution of AI math will I think follow trajectories hinted at in the 90s by the all time great sci-fi author Greg Egan. The nature of math won't change -- but the why of it definitely will. Egan imagined a future ai civilization in Diaspora where "math discovery" -- by nature in the future perhaps accurately described as "mechanistic math discovery" is modeled by society as a kind of salt mine environment in which you can dig for arbitrarily long amounts of time and find new nuggets. The nuggets themselves have a kind of "pure value" as mathematical objects even if they might not have any knowable value outside the mines. Some personalities were interested in and valued the nuggets for their own sake while others didn't but recognized that there were occasionally nuggets found in the mind that had broader appeal.
Research institutes like those founded by Terence Tao in our current present feel like they will align to this future almost perfectly on a long enough timeline -- tho I think on a shorter timeline this area of research is almost certain to provide a ton of useful ways to advance our current ai systems as our current systems are still in a state where literally anything that can generate new information that is "accurate" in some way -- like our current theorem prover engines are enormously valuable parts of our still manually curated training loops.
Interesting but not surprising to me. Once a field expert guides the models, they most likely will reach a solution. The models are good at lazy work for experts. For hard or complicated questions, many a time the models have blind spots.
Ramanujan is a good analogy for this situation. Theories could be right/wrong, until there's a proof. Same with anything AI produces. There's always a "told you so" baked in with it's response.
I got Claude to self reference and update its own instructions to solve making a typed proxy API of any website. After a week, scores of iterations, it can reverse engineer any website. The first few days I had to be deeply involved with each iteration loop. Domain knowledge is helpful. Each time I saw a problem I would ask Claude to update its instructions so it doesn't happen again. Then less and less. Eventually it got to the point it was updating and improving the metrics every iteration unsupervised.
Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.
> Super interesting but what does this mean for us mere mortals?
I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help. I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.
> I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.
Well, for those among us that are not aristocracy already, except for the vanishingly small number of people required to oversee such processes, we’re probably the closest we’re going to get to it. If they don’t need people to do the tech labor, we’ve got way more people than we need, so that’s a huge oversupply of tech skills, which means tech skills are rapidly becoming worthless. Glad to see how fast we’re moving in our very own race to the bottom!
Lol,a race to the bottom where too many tech savvy people are left unemployed while a few "privileged" get a decreasing buying power to maintain security of the digital tools that keep the whole digital dependent civilizations afloat?
Sounds like a great starting plot for an interesting story.
I kind of feel like software engineers working on improving AI are traitors working against other SE’s trying to make a living.
However…
I have to acknowledge my craft of SE has been putting people out of work for decades. I myself came up with business process improvement that directly let the company release about 20 people. I did this twice.
In the grand scheme it's good to invent things that replace human labor. It frees up people to do more interesting things. The goal should be to put everyone out of a job.
Like beg on the corners and starve in the street? Trying to figure out how the basics of capitalism where labor is exchanged for money is not going to work well when the only jobs left are side gigs. Something will have to change and a lot of People will fight said change.
We will come up with new jobs, like we have for all of human history. I think even in an abundance utopia people will still work - we need purpose to sustain our existence.
The work will become even more fulfilling however.
1) It’s not my job to fix all the problems of Capitalism. It’s painful to try to fight the system without collective action. My family and I have to eat too.
2) We have had a solution all along for the particular problem of AI putting devs out of work. It’s called professional licensure, and you can see it in action in engineering and medical fields. Professional Software Engineers would assume a certain amount of liability and responsibility for the software they develop. That’s regardless of whether they develop it with LLM tools or something else.
For example, you let your tools write slop that you ship without even looking? And it goes on to wreak havoc? That’s professional malpractice. Bad engineer.
If we do this then Software Engineers become the responsible humans in the loop of so-called “AI” systems.
It’s not your job to fix capitalism. But it is your job to evaluate if your money making skill comes at too high a price for others.
Say you found a job shooting people in the head for money. Like if you work for ICE or something…
You need to feed your family. Is this job ok? You may decide yes. I decided no. I will find another way to feed my family.
You don’t get to escape consequences because you are a small cog in a large system.
In the bigger picture, automation should free people from labor. But that requires some very greedy people to relax their grip ever so slightly. I imagine they see automation as a way to reduce reliance on labor, and if they don’t need labor, they don’t need people. So let them starve and stop having kids.
> Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.
What can the good guys do? Fire up Claude to improve their systems? Unless you have it working fully autonomously to counter-act abuse, I don't see how you can beat the "bad guys". There may be some industries where this is a solved problem (e.g. you can do all the validation server-sided, religiously follow best practices to prevent and mitigate abuse), but a lot of stuff like multiplayer video games will be doomed unless they move to a "you must use a locked down system we control" model. I honestly don't consider it liberating as someone that has various hobby projects, that now in addition to plain old DDoS I'll also have people spin up layer 7 attacks with just their credit card. It almost makes me want to give up instead of pushing forward in a world where the worst of the worst has access to the best of the best.
> I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help.
That nightmarish scenario is what T.S. Eliot was describing in "The Wasteland" which "portrays deep, existential ennui and boredom as defining symptoms of modern life following World War I."
Later this boredom was described by the Stones, "And though she’s not really ill / There’s a little yellow pill / She goes running for the shelter of a mother’s little helper".
It is a nightmare. Mostly what I'm thinking about while the agents are running is how bored I'm going to be. That is the joke, my deep thought on T.S. Eliot are about the wasteland this thing is going to create.
Nightmarish?! In comparison to the average person's actual job? I'm pretty sure that many people out there would sign up for a battle royale for a chance at such a job.
My clients have been burned before. Once you set up the battle royale with a trusted third party validating that there'll be an assured good job at the end, I promise I'll have enough candidates for you to fill up the first 10 competitions.
I posted a link but don't want to spam HN more than I have.
It is proof-of-concept. Seriously burns some tokens (~80k - ~200k) but doesn't require AI after to scrape and automate a website so if all the people at Browser Use, Browser Base, and every one pounding every website used it, I think, the net benefit would be in the billions. I would recommend using it in isolation. Nonetheless, it works very very well on my machine.
> This type of slop comment is somehow worse than spam.
Here is a description of the iteration loop. [0] I'm working on another draft that will be much more polished and have better explanations of the iteration loop.
> There is no proof, just a self-congratulatory word salad with dubious authenticity.
I worked 8 days straight on that and have been working non-stop on the second draft that is much cleaner and safer. I'm a human being. Please don't be mean. If humanity does come to end, it won't be because of AI, it will be because we can't stop being assholes to each other.
I have similar amounts of success (pretty good!) standing in line at a coffee shop talking to people who work for me through some action that needs to be taken and doing the same with AI.
However I do not trust AI anywhere near as much as I trust the humans. The AI is super capable but also occasionally a psychopath toddler. I sat in amused astonishment when faced with job 2 not running because job 1 was failing Claude went in to the database, changed the failure record to success, triggered job 2 which produced harmful garbage, and then claimed victory. Only the most troubled person would even think of doing that, but Claude thought it was the best solution.
My understanding is that, if confirmed, this demonstrates that AI can find novel solutions. This is a strong counterpoint to generative-AI-is-strictly-limited-to-training-data.
we've had AlphaFold for a while. it's not a novel that we have ML solutions that can find, erm, novel solutions.
however, by and large, most LLMs as typically used by most individuals aren't solving novel problems. and in those scenarios, we often end up with regurgitated/most common/lowest common denominator outputs... it's a probability distribution thing.
lowest quote I got to replace toilet and faucet in the kitchen (my parts, just installation) - $895 (5 quotes total). market for trades is exploding and will grow larger and larger as gen alpha and beyond knows what screwdriver is as much as they know what rotary phone is (they dont how to use either)
For now. The term people use is "centaur", like the half-man-half-horse of mythology.
The AI CEO's are pointing out that when chess was "solved", in that Kasparov was famously beaten by deep blue, there was a window of time after that event where grandmasters + computers were the strongest players. The knowledge/experience of a grandmaster paired with the search/scoring of the engines was an unbeatable pair.
However, that was just a window in time. Eventually engines alone were capable of beating grandmaster + engine pairs. Think about that carefully. It implies something. The human involvement eventually became an impediment.
Whether you believe this will transfer to other domains is up to you to decide.
This is truly amazing. Do people not really realize how amazing stuff like this is? I feel like I'm taking crazy pills here, but man, it certainly feels like we're on the edge of something quite amazing...
Math seems difficult to us because it's like using a hammer (the brain) to twist in a screw (math).
LLMs are discovering a lot of new math because they are great at low depth high breadth situations.
I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.
Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG
Ergo these are latent vectors in our brain. We use analogies like geometry in order to use Algebraic Geometry to solve problems in Number Theory.
An AI trained on Lean Syntax trees might develop it's own weird versions of intuition that might actually properly contain ours.
If this sounds far fetched, look at Chess. I wonder if anyone has dug into StockFish using mechanistic interpretability
https://arxiv.org/abs/2504.13837
1. https://mppbench.com/
I love this and have a corollary saying: the last job to be automated will be QA.
This wave of technology has triggered more discussion about the types of knowledge work that exist than any other, and I think we will be sharper for it.
In any other context than when your paycheck depends on it, you would probably not be following orders from a random manager. If your paycheck depended on following the instructions of an AI robot, the world might start to look pretty scary real soon.
Imagine mcdonald management would enforce dog related rules. No more filthy muppets! If dog harasses customers, AI would call cops, and sue for restraining order! If dog defecates in middle of restaurant, everything would get desinfected, not just smeared with towels!
Nutters would crucify AI management!
- Coherent customer interaction
- Common sense judgements
- Scheduling
- Quality control
All which are baked into humans but not so much into LLMs
Even if it were legal to have an LLM as a GM, I think it would fair poorly
Of course, because it takes multi-modal intelligence to manage a McDonalds. I.e. it requires human intelligence.
> I predict that in the future people will ditch LLMs in favor of AlphaGo style RL
Same for coding as well. LLM's might be the interface we use with other forms of AI though.
Programming is more multimodal than math.
Something like performance engineering might be free lunch though
I have no idea how you come to this conclusion, when the evidence on the ground for those training models suggests it is precisely the opposite.
We are much further along the path of writing code than writing new maths, since the latter often requires some degree of representational fluency of the world we live in to be relevant. For example, proving something about braid groups can require representation by grid diagrams, and we know from ARC-AGI that LLMs don't do great with this.
Programming does not have this issue to the same extent; arguably, it involves the subset of maths that is exclusively problem solving using standard representations. The issues with programming are primarily on the difficulty with handling large volumes of text reliably.
I feel like something people miss when they talk about intelligence is that humans have incredible breadth. This is really what differentiates us from artificial forms of intelligence as well as other animals. Plus we have agency, the ability to learn, the ability to critically think, from first principles, etc.
Also animals thrive in underspecified environments, while AIs like very specific environments. Math is the most specified field there is lol
Don't argue. If you think Hackernews is a representative sample of the field then you haven't been in the field long enough.
What LLMs have actually done is put the dream of software engineering within reach. Creativity is inimical to software engineering; the goal has long been to provide a universal set of reusable components which can then be adapted and integrated into any system. The hard part was always providing libraries of such components, and then integrating them. LLMs have largely solved these problems. Their training data contains vast amounts of solved programming problems, and they are able to adapt these in vector space to whatever the situation calls for.
We are already there. Software engineering as it was long envisioned is now possible. And if you're not doing it with LLMs, you're going to be left behind. Multimodal human-level thinking need only be undertaken at the highest levels: deciding what to build and maybe choosing the components to build it. LLMs will take care of the rest.
I was thinking the other day of how things would go if some of my less tech savvy clients tried to vibe code the things I implement for them, and frankly I could only imagine hilarity ensuing. They wouldn't be able to steer it correctly at all and would inevitably get stuck.
Someone needs to experiment with that actually: putting the full set of agentic coding tools in the hands of grandma and recording the outcome.
Research institutes like those founded by Terence Tao in our current present feel like they will align to this future almost perfectly on a long enough timeline -- tho I think on a shorter timeline this area of research is almost certain to provide a ton of useful ways to advance our current ai systems as our current systems are still in a state where literally anything that can generate new information that is "accurate" in some way -- like our current theorem prover engines are enormously valuable parts of our still manually curated training loops.
"oh awesome let's see if he can solve p!=np!"
Edit: This is going to have huge ramifications for the tech security industry as these systems will be able to break security systems as easily it solved the proof. The sooner the good guys, if there are any left, understand this the better it will be for everybody.
> Super interesting but what does this mean for us mere mortals?
I would go for a 2 or 3 hour walk with my phone using the remote control feature looking every 5 - 10 minutes to make sure it doesn't need human help. I went to the coffeeshop and drank very good coffee listening to music. Then at night I sat and had a beer thinking about T.S. Eliot's 'The Wasteland', the effect of industrialization in England at that time and his views of how ennui affected the aristocracy.
Well, for those among us that are not aristocracy already, except for the vanishingly small number of people required to oversee such processes, we’re probably the closest we’re going to get to it. If they don’t need people to do the tech labor, we’ve got way more people than we need, so that’s a huge oversupply of tech skills, which means tech skills are rapidly becoming worthless. Glad to see how fast we’re moving in our very own race to the bottom!
Sounds like a great starting plot for an interesting story.
However…
I have to acknowledge my craft of SE has been putting people out of work for decades. I myself came up with business process improvement that directly let the company release about 20 people. I did this twice.
So… fair play.
Yeah, but why does it need to take the fun jobs first, like painting, writing poems, coding, making music, ...
I want the AI to cook, do the dishes, take out the trash, etc.
Like beg on the corners and starve in the street? Trying to figure out how the basics of capitalism where labor is exchanged for money is not going to work well when the only jobs left are side gigs. Something will have to change and a lot of People will fight said change.
The work will become even more fulfilling however.
1) It’s not my job to fix all the problems of Capitalism. It’s painful to try to fight the system without collective action. My family and I have to eat too.
2) We have had a solution all along for the particular problem of AI putting devs out of work. It’s called professional licensure, and you can see it in action in engineering and medical fields. Professional Software Engineers would assume a certain amount of liability and responsibility for the software they develop. That’s regardless of whether they develop it with LLM tools or something else.
For example, you let your tools write slop that you ship without even looking? And it goes on to wreak havoc? That’s professional malpractice. Bad engineer.
If we do this then Software Engineers become the responsible humans in the loop of so-called “AI” systems.
Say you found a job shooting people in the head for money. Like if you work for ICE or something…
You need to feed your family. Is this job ok? You may decide yes. I decided no. I will find another way to feed my family.
You don’t get to escape consequences because you are a small cog in a large system.
In the bigger picture, automation should free people from labor. But that requires some very greedy people to relax their grip ever so slightly. I imagine they see automation as a way to reduce reliance on labor, and if they don’t need labor, they don’t need people. So let them starve and stop having kids.
You probably choose not to steal, rob, impersonate someone else, or generally make money illegally.
It can be traitors all the way down.
What can the good guys do? Fire up Claude to improve their systems? Unless you have it working fully autonomously to counter-act abuse, I don't see how you can beat the "bad guys". There may be some industries where this is a solved problem (e.g. you can do all the validation server-sided, religiously follow best practices to prevent and mitigate abuse), but a lot of stuff like multiplayer video games will be doomed unless they move to a "you must use a locked down system we control" model. I honestly don't consider it liberating as someone that has various hobby projects, that now in addition to plain old DDoS I'll also have people spin up layer 7 attacks with just their credit card. It almost makes me want to give up instead of pushing forward in a world where the worst of the worst has access to the best of the best.
That is a nightmarish scenario tbh
Later this boredom was described by the Stones, "And though she’s not really ill / There’s a little yellow pill / She goes running for the shelter of a mother’s little helper".
It is a nightmare. Mostly what I'm thinking about while the agents are running is how bored I'm going to be. That is the joke, my deep thought on T.S. Eliot are about the wasteland this thing is going to create.
>After a week, scores of iterations, it can reverse engineer any website
Cool, let’s see the proof.
It is proof-of-concept. Seriously burns some tokens (~80k - ~200k) but doesn't require AI after to scrape and automate a website so if all the people at Browser Use, Browser Base, and every one pounding every website used it, I think, the net benefit would be in the billions. I would recommend using it in isolation. Nonetheless, it works very very well on my machine.
> This type of slop comment is somehow worse than spam.
Please don't be mean.
It’s insane how insufferable this place is now.
> There is no proof, just a self-congratulatory word salad with dubious authenticity.
I worked 8 days straight on that and have been working non-stop on the second draft that is much cleaner and safer. I'm a human being. Please don't be mean. If humanity does come to end, it won't be because of AI, it will be because we can't stop being assholes to each other.
[0] https://github.com/adam-s/intercept/tree/main?tab=readme-ov-...
2-3 hours "walking" while having to check in every 5-10 minutes?
If I have to check in every 5-10 minutes, I won't taste coffee or hear that there's good music playing.
I had a bad feeling we were basically already there.
However I do not trust AI anywhere near as much as I trust the humans. The AI is super capable but also occasionally a psychopath toddler. I sat in amused astonishment when faced with job 2 not running because job 1 was failing Claude went in to the database, changed the failure record to success, triggered job 2 which produced harmful garbage, and then claimed victory. Only the most troubled person would even think of doing that, but Claude thought it was the best solution.
we've had AlphaFold for a while. it's not a novel that we have ML solutions that can find, erm, novel solutions.
however, by and large, most LLMs as typically used by most individuals aren't solving novel problems. and in those scenarios, we often end up with regurgitated/most common/lowest common denominator outputs... it's a probability distribution thing.
Also that it is now good enough to make researchers faster.
The AI CEO's are pointing out that when chess was "solved", in that Kasparov was famously beaten by deep blue, there was a window of time after that event where grandmasters + computers were the strongest players. The knowledge/experience of a grandmaster paired with the search/scoring of the engines was an unbeatable pair.
However, that was just a window in time. Eventually engines alone were capable of beating grandmaster + engine pairs. Think about that carefully. It implies something. The human involvement eventually became an impediment.
Whether you believe this will transfer to other domains is up to you to decide.