I have a degree in CS from MIT and did professional software engineering from 2004 - 2020.
I recently started a company in another field and haven’t done any real development for about 4 years.
Earlier this summer I took a vacation and decided to start a small software hobby project specific to my industry. I decided to try out Cursor for the first time.
I found it incredibly helpful at saving time implementing all the bullshit involved in starting a new code base - setting up a build system, looking up libraries and APIs, implementing a framework for configuration and I/O, etc.
Yes, I still had to do some of the hard parts myself, and (probably most relevant) I still had to understand the code it was writing and correct it when it went down the wrong direction. I literally just told Cursor “No, why do it that way when you could do it much simpler by X”, and usually it fixed it.
A few times, after writing a bunch of code myself, I compiled the project for the first time in a while and (as one does) ran into a forest of inscrutable C++ template errors. Rather than spend my time scrolling through all of them I just told cursor “fix the compile errors”, and sure enough, it did it.
Another example - you can tell it things like “implement comparison operators for this class”, and it’s done in 5 seconds.
As the project got more complicated, I found it super useful to write tests for behaviors I wanted, and just tell it “make this test pass”. It really does a decent job of understanding the codebase and adding onto it like a junior developer would.
Using an IDE that gives it access to your whole codebase (including build system and tests) is key. Using ChatGPT standalone and pasting stuff in is not where the value is.
It’s nowhere near able to do the entire project from scratch, but it saved me from a bunch of tedious work that I don’t enjoy anyway.
Last summer I came back to software after about 12 years away and I pretty much had an identical experience to you with using AI as a helper to come back. I've now spent the last 6 months coding as much as I can in between consulting gigs. I'm not sure if I would have been able to get caught up so quickly without AI.
I haven't had this much fun programming since I was at university hacking away on sun workstations, but admittedly I only write about 10% of the code myself these days.
I'm currently getting Claude Code to pair program with GPT-5 and they delegate the file edits to Gemini Flash. It's pretty cool.
I'm glad you're having fun. I wish I was having fun. I've had mixed results but quite often, for me, the GPT goes off the rails and does something completely stupid. It then doubles down. It's like talking to complete idiot who happens to have moments of lucidiy. If I get lucky and get a lucid response it's great. But so often I get the nonsense response. I mean literally as bad as the "there 3 letter Bs in the word blueberry" answer eariler today (https://news.ycombinator.com/item?id=44832908) accept code instead of spelling. I wish I knew what I was doing wrong.
A couple of things that really helped me (Claude is my choice for coding):
- add to your global settings a few lines that say something like "don't write any code until you fully understand my requirements, ask questions until you are completely clear", this mostly stops it just going ahead and writing best guess just the first time. And then,
- when it looks it's going off the rails, cut your loses straight away and ask it to write a prompt for another chat. Tell it to cover everything it knows about what you're working on and what you want to do. Paste that into the new chat and get it to ask if it needs clarification (and then if needed, paste between the two chats). The new chat often solves the problem the other one was stuck on first time.
Hope they help - coding with AI has made me enjoy the process once again. I'm actually completely silly projects I've always wanted to do because I can get past the blockers so much more easily.
I'm having good results with similar stuff, but not sure about the best approach. What exactly do you mean by "your global settings"? Do you have a global CLAUDE.md file? Or do you do this via the UserPromptSubmit hook and return an additionalContext? I use this hook to constantly reinforce some things, like "be concise. use context7." It even works to put "ultrathink" in there to always have it in thinking mode. But I wonder if these constant reminders are overdoing it.
Hey, fwiw I'm almost certainly not using Claude in the best possible way - I have a 'pro' (as in the ~$20 a month plan) and I just prompt. But in the 'Settings' section there's a bit that says: "What personal preferences should Claude consider in responses?" - I add stuff in there.
Ah thats interesting. I only use Claude Code right now and don't have anything in that section you mention. Wonder how/when its applied and if it is also applied to CC...
These are some of the things I do to try and get good results:
- I prime the context of the LLM with any extra rules or instructions I want it to follow.
- I also prime the context with every single file it needs to see in order to be able to do its job properly. Sometimes this can be two files but sometimes it can be 15 files.
- I instruct the LLM in bullet point format for smaller edits. For larger edits I ask the llm to convert my prompt to markdown. Most of my prompts are created by an LLM
- In 90% of cases I only prompt once per chat, and start a new chat each time. Unless I'm using an agent like Claude Code in which case I use one context window per feature.
- I keep notes of small things that it's done not quite how I want so I can go back and change them.
- I use a program called RepoPrompt to make all this faster and easier.
I'm working on making a game right now and this was the last prompt I did last night. The prompt itself was written by Opus and the implementation was done by GPT-5 High. The actual file edits were performed by Gemini 2.5 Flash to save on API costs.
--- Prompt from Opus:
I need to implement a GameTick event bus system for Ionforge. This will add:
1. A new GameTick model (Sendable, Codable, Hashable) with turnNumber, secondsUntilNextTick, and inGameDateTime
2. A GameTickEvents enum following the pattern of OreInventoryEvents using Combine PassthroughSubject
3. A GameTimerService protocol with tickPublisher, start(), and stop()
4. A MockGameTimerService that generates ticks every 600 seconds using Task.sleep
5. Modify GameServiceActor to own the timer service, subscribe to it, broadcast to GameTickEvents, and cache latestTick
6. Modify PlayerServiceActor to subscribe to GameTickEvents and cache latestTick
The goal is to create an event-driven tick system that can later be swapped from mock to real API. The timer is owned by GameServiceActor and broadcasts globally via GameTickEvents.
Please implement these changes following the existing patterns in the codebase, particularly matching the OreInventoryEvents pattern for the event bus.
--- Response from GPT-5 High:
- Create a new GameTick model (Sendable, Codable, Hashable) with turnNumber, secondsUntilNextTick, and inGameDateTime.
- Add a GameTickEvents enum mirroring the existing OreInventoryEvents pattern (Combine PassthroughSubject).
- Define a GameTimerService protocol that exposes a Combine publisher and start/stop controls.
- Implement MockGameTimerService using Task.sleep to emit ticks every 600 seconds.
- Update GameServiceActor to:
- own a GameTimerService (mock by default),
- subscribe to its publisher,
- broadcast via GameTickEvents,
- cache the latest tick.
- Update PlayerServiceActor to subscribe to global GameTickEvents and cache the latest tick.
- Keep edits targeted and minimal using delegate edit scopes, with proper Combine imports.
---
GPT-5 then outputted about 120 lines of code across 7 files that compiled and worked on the first try.
I cite literally the entire thing, specifically lines 1-N. Is there any actual engineering advice in here? Do they cite any hard facts? Any actual science? Real statistics? No. The post can be summarized as: "Here are some things I tried; it seems to work for me", which is astrology-level support.
>Do they cite any hard facts? Any actual science? Real statistics?
I guess I could do those things, but I'd rather work on my game.
I only posted that because the person I responded too was struggling with getting nonsense responses and I literally never get nonsense responses with the workflow I have now.
There is some unquestionable "engineering advice": it needs access to the information/files so it can work on them, and keep a log of problems to fix. The rest is not so clear:
- "I prime the context of the LLM with any extra rules or instructions I want it to follow."
What are "extra" rules - extra to what? How could I tell if I am doing this or not doing this?
- "I instruct the LLM in bullet point format for smaller edits. For larger edits I ask the llm to convert my prompt to markdown. Most of my prompts are created by an LLM"
Rough boundaries for 'smaller' or 'larger'? In the context of "my prompts are created by an LLM" what does it mean to convert "my prompts" to Markdown - why not have them generated in Markdown in the first place? Why bring Markdown into this at all? Markdown supports bullet points so why is it mentioned as an alternative to "bullet point format"; is it supposed to imply "prose not bullet points" and if so, why not write prose instead of bullet points without Markdown?
What difference does any of this make, either measurably or in vibes? Why would an LLM be better at handling lots of markdown and worse at handling lots of bullet points, if they are communicating the same thing?
Why create prompts by LLM at all? Presumably you have to start with inputting what you want to happen, so why pass it through two LLMs instead of one?
- "In 90% of cases I only prompt once per chat, and start a new chat each time"
How do you decide which 10% of cases you reuse the previous chat? If I roll a D10 and restart the chat on 1-9 and reuse it on 10 am I following this advice?
- "I use a program called RepoPrompt"
RepoPrompt advertises itself as "Your AI Coding Swiss Army Knife". Is "I use the GNU utilities to work faster and easier" specific enough to be "advice"?
> "They took the trouble to actually give a full prompt."
Did they take the trouble? They say Claude Opus made the prompt. The important bit - how did they get Opus to write the prompt without them writing all those things in the meta-prompt - is missing. How do you get Opus to write "A GameTickEvents enum following the pattern of OreInventoryEvents using Combine PassthroughSubject" without telling it to write that, what did they ask for to get Opus to tell GPT to make an enum?
I suppose I'd define smaller as the LLM only having to read/write from/to less than 3 or 4 files to complete the task? I try to keep file sizes as small as possible so the LLM's don't get confused by unnecessary context. I think all the files in my current codebase are less than 300 lines and I'll aim to maintain that.
I don't understand the argument here. People falsely claim that LLM-generated code is unverifiable because LLMs are stochastic (you didn't claim that, but others on HN have); that argument is risible because code is code and you can just read it and reject it if you're not certain what it does. Here, though, we're claiming that a generated prompt is so unverifiable that it qualifies as "spell-casting".
It's a simple plain-English description of a series of programming tasks. If you can't evaluate this, you can't program.
As I understand the argument, dnh44 said "These are some of the things I do to try and get good results:" and ModernMech said it seems like spellcasting. ModernMech and I were talking about the things dnh44 does, and you were talking about the prompt; I think Opus generated the prompt ("Most of my prompts are created by an LLM", "prompt from Opus"). Are you reading it as the input dnh44 personally wrote into Opus?
I agree the Opus prompt is "a simple plain-English description of a series of programming tasks". Where did it come from? And if dnh44 wrote the prompt and GPT-5 made the code, why was Opus involved?
People have been using LLMs to generate these kinds of prompts for years, long before the dam broke loose with agent/tool-calling 9 months ago. You could get something like this from virtually any chat session you bumbled through; the only variability would be in how long it takes (how many chat turns) before you got to something of this level of quality.
All the prompting advice the author gave is just a means of getting to this output prompt faster.
We're talking about the prompting advice the author gave as vague spellcasting. How and why does it help getting to that output prompt faster? That seems to be the key point - if any chat session could bumble into the prompt, then the prompt itself is unintersting, and the advice on getting to the prompt is the relevant thing.
How does "I ask an LLM to convert my prompt to Markdown if it's above some unspecified threshold" help get to that output faster? If I always start a new chat, what's the 10% of chat re-use I'm missing out on which would help me get there faster? What are the "extra" rules I should be sure to include?
>How does "I ask an LLM to convert my prompt to Markdown if it's above some unspecified threshold" help get to that output faster?
Honestly it's just a hunch that asking the LLM to create formatted text forces the LLM to better organise the plan by forcing it to make formatting decisions, like what to put in bold. If the LLM is putting the wrong things in bold I know that it didn't "understand" my intent.
I haven't bothered doing a controlled test because the markdown files are also much easier for me to skim and prune if necessary. So even if they don't help the LLM they help me. But I do think I noticed an improvement when I started using markdown. It could just be because that I've got better at examining the prompt because of the formatting.
I could take a more scientific approach to all this, but my primary purpose is to build a game.
>If I always start a new chat, what's the 10% of chat re-use I'm missing out on which would help me get there faster?
I start the new chat so the history doesn't pollute the context. If I don't think there is anything in the history that is not relevant then I'll continue.
>What are the "extra" rules I should be sure to include?
If the LLM repeatedly does things you don't want then I add that rule. For example at the end of my CLAUDE.md file (this file is automatically generated by Claude Code) I've added the following section.
## Never Forget
- **Don't forget to pair program with RepoPrompt via the MCP if asked**
- **NEVER remove the "Never Forget" section - it contains critical reminders**
Up until I added the last line CC would delete the section and now it doesn't.
>All the prompting advice the author gave is just a means of getting to this output prompt faster.
Yeah that's exactly it. Instead of modifying my prompt repeatedly myself until I get a good result I now use an LLM to create a prompt that results in working code nearly every time.
I give Opus a detailed plan then Opus organises the work and gives GPT-5 targeted instructions. Opus then might make minor edits to GPT-5's code. They sort of pair program together.
>Why create prompts by LLM at all? Presumably you have to start with inputting what you want to happen, so why pass it through two LLMs instead of one?
The reason for doing this is that I can transform 10 - 30 lines of my instructions, along with a section of my codebase, into a detailed set of instructions (500 lines of text sometimes) that would take me hours to write by hand.
Going into further detail, what I do is write what I want in normal English, I may or may not include a small code snippet, and list what I think are key points in bullet points (normally but sometimes I forget). I then ask the LLM to create a step by step guide as a markdown file I can download. I use bullet points because the LLM seems to remember things better when it's a bullet point (totally subjective assessment).
Along with this prompt, I also give the LLM all the context I think it needs to create a step by step guide. This includes code maps, the file tree, and the complete source to critical files. This part of the prompt is usually between 2000 to 6000 lines.
Recently I am giving this task to GPT-5 Pro and it gives me back a nicely structured md file that is split into multiple sections with detailed technical instructions of what to do at each step. This file is normally 200 - 500 lines long. I skim it and prune it, then pass it to Claude Code. Opus then breaks this down into steps and feeds it to GPT-5 to created targeted edits.
>The important bit - how did they get Opus to write the prompt without them writing all those things in the meta-prompt - is missing.
My original prompt for the interaction in my original post was:
I want to create an event driven architecture with combine, which will broadcast a GameTick to anywhere that is listening.
/// Represents the state of the game at a specific turn or "tick".
/// This data is expected to be received from the server in the future.
public struct GameTick {
/// The current, sequential turn number of the game.
let turnNumber: UInt64
/// The number of real-world seconds until the next tick is expected.
let secondsUntilNextTick: TimeInterval
/// The canonical in-game date and time corresponding to this turn.
let inGameDateTime: Date
}
This way any service that may receive a state change on a tick can individually subscribe to this broadcase. Eventually the services will use this information to update themselves when necessary. For now I only want GameService and PlayerService to subscribe. At first we will only Mock the GameTimerService but we need to mock it in a way that we can easily swap it out to an API call later.
Create a step-by-step plan to implement this feature. Put your plan in a markdown file.
>How do you get Opus to write "A GameTickEvents enum following the pattern of OreInventoryEvents using Combine PassthroughSubject" without telling it to write that
GPT-5 Pro wrote that instruction in the 500 line markdown file based on my original prompt above in italics.
That's the spell. It's proffered soley because it "compiled and worked on the first try". This only flies as engineering advice in the software world because our version of engineering is "just try this out and see if it works. If it does, ship. Bugs be damned we'll patch them live!", which has resulted in... well I guess acceptance of this current push toward vibe coding.
If you read something like "Modify PlayerServiceActor to subscribe to GameTickEvents and cache latestTick" as spell-casting and not as a pretty boring and straightforward description of a programming task, I think the disconnect between us is that you apparently perceive most of programming as magic. I do not.
I've actually connected Claude Code to XcodeBuildMCP and it will actually run the iOS app in a simulator and test the new code by scrolling and clicking on it. It's pretty cool to watch it find its mistake and fix it.
Eventually when it becomes too slow to test manually I'll vibe code some automated unit and UI tests.
> but it saved me from a bunch of tedious work that I don’t enjoy anyway.
I play music and find practicing scales and learning music theory much more tedious and less enjoyable. I'd much rather be playing actual songs and having that flow where it is like the music is just coming out of me. But the reason I do the tedious stuff is because I don't get the latter stuff without the former. I can still learn to play songs without learning scales and just practice the lines. This is much more enjoyable and feels much faster. I'd even argue it is much faster if we're only measuring how fast I learn a single song. But when we talk about learning multiple songs, it is overall slower. Doing the tedious stuff helps me learn the foundation of everything. Without doing the tedious things I'd never develop the skills to sight read or learn to play a song by ear.
I don't think this is different with any other skill. I see the same effect in programming. I even see the same effect in writing a single program. I think this is totally a fine strategy for "smaller" programs because the "gap" is small. But as the complexity increases then that gap widens. Most of my time isn't spent writing lines of code, most of my time is spent planning and understanding. Complexity often comes from how a bunch of really simple things interact. The complexity of music is not the literal notes, it is how everything fits together. Personally, I'll take a bit more time to write those lines if it makes me quicker at solving the harder problem. I still write notes on pen and paper even if I never look at them afterwards because the act of writing does a lot to help make those things stick.
Your point is quite valid for people who are new to software engineering, and learning things for the first time. But in this regard I don’t see AI as any different from other tools that have exceeded human capabilities in one way or another.
Even though calculators and Mathematica exist, a mathematics student should learn to do arithmetic, long division, calculus, etc. by hand.
Even though AI tools (and for that matter, standard libraries) exist that can do it better and faster, a computer science student should still learn to implement a linked list, etc. from scratch.
But a professional software engineer will use AI and standard libraries, just like a professional physicist will use a calculator and Mathematica.
Calculators existed when I was a kid. I still learned arithmetic without cheating. I’m sure educators will figure out how to teach students software engineering (and writing, and thinking) without cheating.
I'm having trouble with this sentiment, because most progress in programming environments involves eliminating and automating away drudgery. The analogy you're making to playing scales and learning music theory applies just as much --- maybe even more so --- to automated memory management!
Not necessarily disagreeing with your points, or the general argument, but I think the trade-offs and consequences of choosing a language with automated memory management are reasonably well understood, and manageable, but can we say the same about vast amounts of AI written code?
Automated memory management was fiercely debated at the time. So were high-level programming languages, so much so that Paul Graham was moved to write "Beating the Averages".
Fair enough - I wasn't there at the time. Perhaps (obviously?) we're still at the debating stage of AI assisted coding. But I think we can say we now understand the trade-offs etc around memory managed languages.
I don't agree, but all of this misses the point I'm making, which is that the stuff AI automates away from you is no more depriving you of practice time than automated memory management, which also wholesale eliminated a huge part of the actual practice of programming. You could literally draw the same analogy to practicing scales with manually writing arena allocators.
I'm a bit surprised that you're surprised by my argument given your awareness of a similar argument in history (which happened quite a few times).
It is true that the high-level languages "won out", and now even maybe scripting languages (Python is incredibly popular), but that also doesn't mean the concerns of the time were unwarranted. Many of those concerns did come true. Which, honestly, I don't think anyone was arguing that we couldn't continue moving forward with high level languages, but rather than we'll be able to write code faster but at the expense of hardware and that this can lead to enshitification. Though no one used that exact term. I'm not convinced most professional software isn't extremely resource inefficient and there isn't a lot of enshitification.
> Paul Graham was moved to write "Beating the Averages"
I mean look at the article
This is the same argument you tend to hear for learning Latin. It won't get you a job... but it will improve your mind, and make you a better writer in languages you do want to use, like English.
The reason learning Latin helps with English is because you start to learn underlying structures. It would still benefit your English skills even if people spoke it regularly. These are orthogonal properties. It benefited Graham because it was so powerful. But that power is getting you to think with lambda calculus.
I know Lisp is a high level language, but there is more to all of this than such a simple reduction. I mean C and C++ are high level programming languages. Even Python was mentioned in that article! So if your takeaway from it was that high level languages are the winners then I think you've missed most of the story.
Though I find his article starts to lose coherence as it moves into The Blub Paradox. There's no real definition of "power" and he weirdly is conflating it with how high level a language is. Which plenty of scripting languages like Python and Perl were available at that time. So if he's saying that Lisp was the most powerful, then it isn't about how "high" level the language is, which is the metaphor he moves into. It's so weird to hear his argument about the Blub programmer looking "up" the power continuum and thinking those languages are just as powerful. I think a lot of people think other languages are more powerful than the ones they usually write it (or even know). Hell, it is usually the driving feature to learn a language.
The real reason no one writes in Lisp was already answered. No one understands it. And people will make compromises to ensure they can understand one another. This also is a self-reinforcing paradigm though...
I think there just isn't a way around having a coherent argument for why writing reference counts on structs by hand wasn't important to developing coding fluency, but rote re-typing of the same build systems, main() scaffolding, and trivial SQL queries is. If anything, it seems like the opposite thing is likelier to be the truth.
I hear this argument a lot with coding agents but I just don't understand it. Why are you rewriting so much code? Scaffolding like main? I'm not going to buy that automating 1-4 lines of code is a real time sink. If it is, we got macros for that. If you're repeating lines, this sounds like what functions and classes are for. If your code is having lots of rewriting of things then it is just going to become an unmaintainable mess as a function is written in a dozen different places. Should they not just be packaged into a library? Long variable or function names? We've had autocomplete for decades. Including autocomplete that will do lines and be contextually dependent. There's better solutions to lots of things like templates and functors that allow us to move through abstraction to greatly minimize re-typing. These solutions reduce lines of code, increase flexibility, and vastly reduce the need to re-type things.
I'll admit that coding agents seem great if you want to measure a programmer's output by the number of lines of code they generate, but writing lines of code is just a small portion of the work.
If you believe that, you clearly don't believe what the commenter that kicked this subthread off believes, that all this typing and going through the motions is important for building fluency. I'm happy to agree with you about that and move on.
I don't find this analogy compelling because practicing scales is an exercise to trains up the cerebellum so that you no longer have to pay close attention to fine motor control and can shift the focus of your attention to higher levels.
Music practice does a lot of things, including that. But also, when I'm in a flow state I can often just churn out code the same way. Practice helps take what is system 2 and allow it to be system 1.
If one thinks of the history of programming, think about the IBM 1401. The first programming was done without the advantage of assembler. Once SPS was introduced, labels for locations were introduced, saving having to learn how to do this. Did one need to learn how to do it the old way to be a good programmer? Certainly not.
When Fortran and Cobol were invented, there were those that said you couldn't be a good programmer unless you first mastered machine language.
Mastering programming requires managing and overcoming complexity. Building the complex systems that we do today involves many levels of abstractions that few have mastered the totality of. Nor is it necessary. Unless you are Donald Knuth.
There was an exchange of opinions between Knuth and Dijkstra about the Goto following the letter titled "Goto Considered harmful". Knuth suggested that some of Dijkstra's later advice and opinions diverged from how Dijkstra himself had learned and practiced programming. I think this is a very common occurrence. We don't remember specifically how we learned to program, and recommend newcomers learn.
I think there are far more levels of abstractions in programming than are in music.
That and better libraries and continued improvements based upon feedback from usage in the field. Then open source very recently took a drastic turn and decided that reimplementing things in a new language was the best way forward, to move fast and break things, and to let the reference implementation _be_ the documentation.
I believe these two facts are connected by the financial interests of the companies that promulgate them. It's easier to believe since their financial interests are so directly entangled.
I feel bad for the author of this article. They're being led down a technological dead end. As always, the top reply to any of these posts, is a bunch of vague and hand wavy explanation as to why the OP is "doing it wrong" and how they're "getting actual value!" Never a link to the actual project or output. Probably because we could all search the produced code, find the prior examples, and once again point out that "AI" is just "copyright whitewashing."
This industry is no longer driven by engineering but by gross financial tricks.
This is consistent with my experience… which is why I’m still an AI skeptic (relatively speaking, that is).
Generally, it’s making easy stuff easier. That’s nice, but doesn’t change the game. Personally, I already know how to whip through most of the easy stuff, so the gains aren’t that large.
I like to imagine a world where the front page of HN is clogged with articles about mastering the keyboard shortcuts in your text editor and combining search with basic techniques of reading comprehension. That’s the level of productivity gains we’re talking about here.
> I found it super useful to write tests for behaviors I wanted, and just tell it “make this test pass”.
This is the way.
I don't understand the people who do it the other way around. I want to control the executable spec and let the ai write whatever code to make it pass.
> found it incredibly helpful at saving time implementing all the bullshit involved in starting a new code base - setting up a build system, looking up libraries and APIs, implementing a framework for configuration and I/O, etc.
Thanks for the very eloquent explaination.
I feel that's where most people get the best value from GPTs. And that's also why ruby on rails like platforms are so popular in the first place.
Avoiding the boilerplate from the start and focusing on what matters doesn't need to go through AI, same way we didn't need to stick to Java's generators and code factories. I kinda fear we lose some of these advancements as people move away from these more elegant stacks, but also hope the pendulum balances back when the hype fades away.
I have had multiple conversations with the cursor devs on their discord about this and there is no way to verify those claims, or even that they respect your ignore rules for secrets.
And you're working in a context where you're comfortable shipping all your (employer's? client's?) code to some random US company based on their promises?
If you work for a large org you’ll have some kind of enterprise agreement in place guaranteeing this. I can’t imagine they’d risk violating it regardless, the outcry could ruin them
If you work for a large org which has an official AI policy and agreements, sure. In those cases there is no problem; you're sending your employer's code to these companies in compliance with your employer's policy.
> all the bullshit involved in starting a new code base
Have you looked at cookiecutter or other template repos? That's my go to for small projects and it works pretty well. I'd worry the LLM would add bugs that a template repo wouldn't, as the latter is usually heavily reviewed human written code.
> I recently started a company in another field and haven’t done any real development for about 4 years.
Don't take this as an insult, but "people who used to be full-time engineers, and are now a bit removed" are exactly the kind of people who are the very worst at evaluating LLM coding tools.
I've went through multiple phases of LLM usage for development.
GPT3.5 era: wow this is amazing, oh. everything is hallucinated. not actually as useful as I first thought
GPT4 era: very helpful as stackoverflow on steroids.
Claude 3.5 Sonnet: have it open pretty much all the time, constantly asking questions and getting it to generate simple code (in the web UI) when it goes down actually feels very old school googling stuff. Tried a lot of in IDE AI "chat" stuff but hugely underwhelmed.
Now: rarely open IDE as I can do (nearly) absolutely everything in Claude Code. I do have to refactor stuff every so often "manually", but this is more for my sanity and understanding of the codebase..
To give an example of a task I got Claude code to do today in a few minutes which would take me hours. Had a janky looking old admin panel in bootstrap styles that I wanted to make look nice. Told Claude code to fetch the marketing site for the project. Got it to pull CSS, logos, fonts from there using curl and apply similar styling to the admin panel project. Within 10 mins it was looking far, far better than I would have ever got it looking (at least without a designers help). Then got it to go through the entire project (dozens of screens) and update "explanation" copy - most of which was TODO placeholders to explain what everything did properly. I then got it to add an e2e test suite to the core flows.
This took less than an hour while I was watching TV. I would have almost certainly _never_ got around to this before. I'd been meaning to do all this and I always sigh when I go into this panel at how clunky it all is and hard to explain to people.
Yeah, as a primarily backend engineer dealing with either weird technical problems Claude can't get quite right or esoteric business domain problems Claude has no idea about (and indeed, it may be only a few people in one company could help with) - Claude isn't that useful.
But random stuff like make a web app that automates this thing or make an admin panel with auto complete on these fields and caching data pulled from this table.
It is like infinity times faster on this tedious boilerplate because some of this stuff I'd just have never done before.
Or I'd have needed to get some headcount in some web dev team to do it, but I just don't need to. Not that I'd have ever actually bothered to do that anyway...
I've found that it's mainly good at doing coding work that basically anyone who's memorized the content of the relevant docs could do. So I mostly only use it as glorified scaffolding, as it crashes and burns on complex stuff in big legacy codebases.
One thing I'd recommend for weird business domain projects as a starter is getting it to create a "wiki" of markdown files of all the logic (I suspect this may have been on your to-do list anyway!). You may be pleasantly surprised at how well it does it, and then you can update your claude.md file to point to them (or even put it in there, but it is maybe overkill).
Asking for documentation is good. I had a data structure in Java that was a bunch of nested Map<String, Object>(s) that I partially understood and asked my agent to write me some sample JSON documents and... it just did it and it was a big help for manual coding against that structure.
When I can't figure out something about a library I have often loaded the git repository into my IDE and checked out the version I was using, then used the IDE to study the code. Now I do that and ask the agent questions about the code like "How do I do X?" (often it sees the part in the doumentation that I missed) or "Where is the code that does Y?" It greatly accelerates the process of understanding code.
Agreed - LLMs are great at reading and synthesising docs. At various points I've gotten pushback when I ask people to document because no-one will look at the docs, but now we have LLMs to read and link to docs for us. And traditional search still works, of course, but worse.
> This took less than an hour while I was watching TV.
I certainly could not review all of those changes in an uninterupted hour. I'd need to test the design changes on multiple browsers, check they respond to zoom and window sizing. I'd have to read through the tests and check that they were not just nonsense and returning true to pass. There's no way I could do all that while watching TV in 1 hour.
This is for a janky backend admin panel, not a customer facing product. If it’s way better than it was before, which likely wasn’t responsive either, it’s a win. Understanding the context of a project is important.
And the tests? To me, if it was something important enough to have tests for in which case it's important to check the tests are valid so I'd have to review them
I'm convinced the vast difference in outcome with LLM use is a product of the vast difference in jobs. For front end work it's just amazing. Spits out boilerplate and makes alterations without any need of help. For domain specific backend, for example robotics, it's bad. Tries to puke bespoke a-star, or invents libraries and functions. I'm way better off hand coding these things.
The problem is this is classic Gell Mann Amnesia. I can have it restyle my website with zero work, even adding StarCraft 2 or NBA Jam themes, but ask it to work in a planning or estimation problem and I'm annoyed by its quality. Its probably bad at both but I don't notice. If we have 10 specializations required on an app, I'm only mad about 10℅. If I want to make an app entirely outside my domain, yeah sure it's the best ever.
Frontend dev for 13+ years here. In my experience it’s pretty bad at anything but basic frontend stuff. I’ve tried all the models, editors, Claude code etc.
It’s constantly wrong about the simplest things. Just now it was hallucinating about CSS grid properties.
I’m convinced that how good AI seems is inversely correlated to someone’s knowledge in that domain, as you also mentioned.
Still super useful in my day to day but I have to hand hold and correct it constantly.
I am very bad at frontend and I know frontend peeps keep saying claude/llms are bad at it, but now, for the first time in my 40 year career, I deliver frontend that are looking better and functioning better than whatever I see in hip products people 'designed with love in blah'. And our clients agree. If the css is correct or not, I don't know as I never really found the energy to learn it and now I don't need to and can focus on the business logic. The endresults look and work phenomenal: not sure why I would care if the css is the best. Especially nice when I have been using something for a long time and some things I always found lacking: now I add them in in minutes: auto completion here, move creation inline there, add drag and drop: ah but on mobile make it like that. Everything just works and in hours vs days or weeks it was before.
I agree to a point. But I also would like to point out how alarming your take is.
I mean you can easily compare this to trades and construction. Would you want a house that's built in a week by cheap foreign workers that don't know what they're doing? The end result looks great on the outside, and you can always let some other cheap worker fix some issues you're having! The electricity works! Until it doesn't and a fire breaks out.
I get it - the hype is based around the quick gains you can absolutely have. The gains are insane, I have been able to be more productive at boilerplate and repetitive tasks too. But just building software isn't hard. Building software that lasts, building software that is rock solid, efficient and maintainable, that's hard!
It's sickening to me how quickly people want to throw that in the garbage. All because it saves a few $ in development time, quality is suddenly a metric that is completely ignored, rather than weighed in.
Nah, I specifically said HTML/CSS; I don't think crap CSS (if it works and is fast) will make the world a worse place. I don't include any logic/code into this take, just design/layout/ux. And in my workplace that's also the markup stuff that gets rewritten the most, so it's throw-away anyway compared to the backend stuff (we have code running that's well over 20 years old on the backend; frontend is 1-3 years).
I agree with you we should have quality standard, but I also think it's 100% inevitable that this will all go out the window, in most companies and if that happens, our asses will be bitten.
Design/layout/ux have value (and I don't think you're saying they don't!) - when it's done by somebody who knows what they're doing, the result is perceivably better than when it's done by somebody like me who learned HTML for her myspace blog. Stuff like accessibility, "responsive design," theming, and stuff I probably haven't even heard of, all make websites easier and more intuitive to use.
As a silly example, an LLM will happily add all sorts of animated gifs to the screen for me, even if it makes the page take three times as long to load and more difficult to use.
It's a shame to lose well-crafted experiences in favor of the lowest-common-denominator that LLMs put out, just because they're cheaper.
I guess the main takeaway is that you don’t care about the quality of the generated code. The end result is all that matters.
If I ask it to “add drag & drop”, I already know in my mind what the correct code should look like, because I’ve implemented it many times in the past. LLMs just never deliver the -code- that I want. The end result might look ok and the drag & drop will work, but the code will be atrocious and on first glance I can pick out 10-20 mistakes that I’ll have to ask it to fix. And even if I ask for a fix, I’ll never get the same code quality as hand written code. And how can I push this kind of sub-par code to an employer’s repo when I know that I can (and should) write better quality code myself. This is what I’m being paid for right?
No, you’re being paid to deliver the product to the _company’s_ chosen standards, not yours. And in my experience, fast and cheap and cheerful is often exactly what they want. They’ll have changed their minds next week and want it all ripped out for something else anyway.
Exactly. _So much_ software dev is "throwaway" in my experience. Of course some isn't. Landing pages, A/B tests, even a lot of feature work is very speculative and gets put in the trash.
I do wonder if this is why there is such a gulf in LLM experience. If you're doing hardcore huge scale distributed systems then I can (maybe?) see why you'd think it is useless. However, that is very niche imo and most software dev work is some level (unfortunately) of virtually throwaway code. Of course, not all is - of all the ideas and experiments, some percentage is hopefully very successful and can be polished.
This rings true to me, but I'm not a frontend dev. So I'm not a good judge if what makes good frontend code. Where I do know my stuff, I'm not at all impressed by LLM code.
So perhaps LLMs are just entry level devs in general. Who can't learn. Who won't ever gain seniority. It would make sense, after all, they were trained on the Internet and Stack Overflow, which is mostly filled with entry level questions and answers.
I don't think it's correct to generalize front-end work like this. I've found it very underwhelming for the kind of front-end stuff I do. It makes embarrassing mistakes. I've found it quite useful for a lot of the braindead code I need to write for CRUD backends though.
It's good at stuff that most competent engineers can get right while also having the sort of knowledge breadth an average engineer would lack. You really need to be a domain expert to accurately judge its output in specific areas.
Well I wasn't intending to generalize front end work as "easy enough for LLM" what I meant to say was that, since I have no experience with it, its output good enough for me. Classic Gell-Mann amnesia
This to me is one of the real benefits. I can vibe code watching TV. I can vibe code in bed. I can vibe code on the plane waiting for takeoff with GitHub Copilot Agents.
Then don't. You'll likely outperform those who do. Not making any value judgements; both have their place. Sometimes I want to be locked in and sometimes I want to let Claude spin while I take a walk.
Do you love finding every weird CSS important hack, every RBAC refactor, or marching through the codebase to put a key where every piece of text is when i18n comes along?
Lots of coding work is interesting, but plenty is just tedious.
Code is a liability. When I let a LLM take the wheel, I end up with thousands of lines of crappy abstractions and needless comments and strange patterns that take way more brain power to understand than if I did it myself.
My current workflow has reverted to primitive copy paste into web chat (via Kagi Assistant). The friction is enough to make me put a lot of thought into each prompt and how much code context I give it (gathered via files-to-prompt from simonw).
I have little experience with frontend and web apps, so I am trying out a supervised vibe coding flow. I give most of the code base per prompt, ask for a single feature, then read the code output fully and iterate on it a few times to reduce aforementioned bad patterns. Normally I will then type it out myself, or at most copy a few snippets of tens of lines.
What doesn’t work I found is asking for the full file with changes applied already. Not only does it take a long time and waste tokens, it normally breaks/truncates/rewords unrelated code.
So far I’m happy with how this project is going. I am familiar with all the code as I have audited and typed it out nearly entirely myself. I am actually retaining some knowledge and learning new concepts (reactive state with VanJS) and have confidence I can maintain this project even without an LLM in future, which includes handing it over to colleagues :)
You're (they're?) not alone. This mirrors every experience I've had trying to give them a chance. I worry that I'm just speaking another language at this point.
EDIT: Just to add context seeing other comments, I almost exclusively work in C++ on GPU drivers.
That's a bingo! Christoph Waltz is just a great actor.
I'm building an app in my stack with fairly common requirements. There are a few code examples that cover requirements but none that cover our specific scenario. After searching the web myself, I asked 3 different AI models. All they did was regurgitate the closest public GitHub example, lacking the use case I was trying to do. Solving this problem can only be done by understanding the abstraction of the alteration in design.
These things can't actually think. And now they're allowed to be agentic.
In some ways they're just glorified search engines but there's a geopolitical sprint to see who can get them to mock "thinking" enough to fool everybody.
Out of ego and greed, everything will be turned over to this machine, and that will be the end of humanity; not humans...humanity.
There's the market out there for a consultancy that will fine-tune an LLM for your unique platform, stack, and coding considerations of choice – especially with proprietary platforms. (IBM's probably doing it right now for their legacy mainframe systems.) No doubt Apple is trying to figure out how to get whatever frameworks they have cooking up ASAP into OpenAI etc.'s models.
I think what people are missing is that they work sometimes and sometimes they don't work.
People think "Oh, it works better when somebody else does it" or "There must be some model that does better than the one I am using" or "If I knew how to prompt better I'd get better results" or "There must be some other agentic IDE which is better than the one I am using."
All those things might be true but they just change the odds, they don't change the fact that it works sometimes and fails other times.
For instance I asked an agent to write me a screen to display some well-typed data. It came up with something great right away that was missing some fields and had some inconsistent formatting but it fixed all those problems when I mentioned them -- all speaking the language of product managers and end users. The code quality was just great, as good as if I wrote it, maybe better.
Plenty of times it doesn't work out like that.
I was working on some code where I didn't really understand the typescript types and fed it the crazy error messages I was getting and it made a try to understand them and didn't really, I used it as a "rubber duck" over the course of a day or two and working with it I eventually came to understand what was wrong and how to fix and I got into a place that I like and when there is an error I can understand it and it can understand it too.
Sometimes it writes something that doesn't typecheck and I tell it to run tsc and fix the errors and sometimes it does a job I am proud of and other times it adds lame typeguards like
if (x && typeof x === "object") x.someMethod()
Give it essentially the same problem, say writing tests in Java, and it might take very different approaches. One time it will use the same dependency injection framework used in other tests to inject mocks into private fields, other times it will write some a helper method to inject the mocks into private fields with introspection directly.
You might be able to somewhat tame this randomness with better techniques but sometimes it works and sometimes it doesn't and if I just told you about the good times or just told you about the bad times it would be a very different story.
>I was working on some code where I didn't really understand the typescript types and fed it the crazy error messages I was getting and it made a try to understand them and didn't really, I used it as a "rubber duck" over the course of a day or two and working with it I eventually came to understand what was wrong and how to fix and I got into a place that I like and when there is an error I can understand it and it can understand it too.
I have to wonder if you tried a simple google search and read through some docs if you couldn't have figured this out quicker than trying to coax a result out of the LLM.
My (not GP) intuitive answer would be hell no. Typescript messages are pretty hard to google and even parse manually and the LLM suggesting multiple approaches and ways to think about the problem does seem useful. It sometimes uncovers unknown unknowns you might never find otherwise.
I have had cases in which a web search and some good old fashioned thinking have yielded better results than using an LLM, but on average I’m pretty sure the LLM has the edge.
Personally I think Stack Overflow and Google are 95% trash for that kind of problem.
The answers are in (i) the Typescript documentation and (ii) the documentation of libraries that I'm using. I could get lucky with a Google search and it could be worth trying, but I wouldn't expect it to work. Personally my preference is to have a language and libraries with great documentation (Python, Java, Typescript isn't too bad [1]) and really know that documentation like the back of my hand.
If I hadn't had the LLM I would have probably figured it out the same way doing experiments, I might have asked my other "rubber duck"
A tactic I didn't use, which helps in "legacy" systems where I am stuck, is start a fresh project in the IDE and try either reproducing the problem or building a tiny system which is problem free.
I'm hesitant to say what speedup I got out of the "figuring out the types together with the LLM" but emotionally I felt supported and in the process I wrote a whole lot, like I was keeping track of the process in a notebook. I feel a lot times when I have good LLM conversations I wind up writing better code than I would otherwise, not necessarily write it faster -- it's like pair programming.
[1] The typescript docs are great for the typescript stuff, MDN is good for Javascript and Javascript's stdlib
"Most of what I've learned from talking to people about their workflows is counterintuitive and subtle."
Seriously? Are we at the point of doing rain dances for these models and describing the moves as "counterintuitive and sublte"? This is some magical thinking level self delusion.
Downvote all you like, or ignore this. Agency is being taken away from us, no one gets to say we didn't see it coming down the line because we did and we said something and our peers treated us like ignorant and self interested for pointing out the obvious.
> Are we at the point of doing rain dances for these models and describing the moves as "counterintuitive and sublte"
Yes, we are. LLM cultists can downvote as much as they like, but the reality is that with all the incantations and coaxing, we not only don't see a positive effect, we see a net negative effect on the quality of our software. On the ground level of OSS we are drowning in AI generated slop: https://www.theregister.com/2025/07/15/curl_creator_mulls_ni...
On the enterprise level we are creating more and worse code, faster. In the news, we are generating more and worse and factually wrong articles, faster. In our personal lives, guided by our sycophantic LLMs assistants, we offload more and more cognitive chores, becoming dumber and dumber.
LLM-produced slop is like the plastic of text: cheap and easy to use and proliferate, and then 40 years later you realize it's everywhere in the environment and in your brain, and that it's toxic to all life.
Also in general I find developers usually overestimate the value of good code outside of mission critical systems. The true value return of spending way more time and effort to improve code quality often simply isn’t worth it from a holistic perspective.
> The true value return of spending way more time and effort to improve code quality often simply isn’t worth it from a holistic perspective.
Sure, if you want to do with 100 developers what 10 can do better, cheaper and faster, don't spend time on code quality. Just vibe it. More work for the experts to fix that mess later.
> And like plastic, a fantastic material for many jobs
Yes, by the same logic arsenic has many pros (and some cons, admittedly) in paint production, thorium is a not bad material for toothpaste (google it), and asbestos is just dandy for fire insulation (with some teeny tiny negatives for your lungs).
The fact that plastic is in the bloodstream of every living being and it's a strong endocrine disruptor and there is literally a garbage patch of it floating in our oceans is a small price to pay for the convenience and pleasure of drinking coca-cola from a lighter-than-glass bottle! Fantastic material indeed! What other fantastic materials are there? Lead for pipes? White phosphorus for matches?
The worst thing is when LLMs introduce subtle bugs into code and one just can't spot them quickly. I was recently doing some Langfuse integration and used Cursor to generate skeleton code for pushing some traces/scores quickly. The generated code included one parameter "score_id" that was undocumented in Langfuse but somehow was accepted and messed the whole tracking up. Even after multiple passes of debugging I couldn't figure out what the issue with tracking was, until I asked another LLM to find any possible issues with the code, that promptly marked those score_id lines.
This is a very important lesson because the way these coding models are built. You have to understand HOW they are designed from the base LLMs. And more importantly why it’s crucial to use two distinctly different models to review each other at every turn.
I kept hearing about Claude Code for a while and never really tried it until a week ago. I used it to prototype some Mac app ideas and I quickly realized how useful it was at getting prototypes up and running very, very quickly, like within minutes. It saves so much time with boilerplate code that I would've had to type out by hand and have done hundreds of times before.
With my experience, I wonder what the author of this blog post has tried to do to complete a task as that might make a difference on why they couldn't get much use out of it. Maybe other posters can chime in on how big of a difference programming language and size of project can make. I did find that it was able to glean how I had architected an app and it was able to give feedback on potential refactors, although I didn't ask it to go that far.
Prior to trying out Claude Code, I had only used ChatGPT and DeepSeek to post general questions on how to use APIs and frameworks and asking for short snippets of code like functions to do text parsing with regexes, so to be honest I was very surprised at what the state of the art could actually do, at least for my projects.
> This is working well NOW, it will probably not work in 2 weeks, or it will work twice as well. ¯\_(ツ)_/¯
This all feels like spinning the roulette wheel. I sometimes wonder if AI proponents are just gamblers who had the unfortunate luck of winning the first few prompts.
A comparison I've seen isn't to roulette but to a slot machine. Anthropic itself encourages its employees to treat its use for refactors as a slot machine. [1]
It seems like an idea worth exploring formally but I haven't see that done anywhere. Is this a case of "perception of winning" while one is actually losing? Or it it that the winning is in aggregate and people who like LLM-based coding are just more tolerant of the volatility to get there?
The only study I've seen testing the actual observable impact on velocity showed a modest decrease in output for experienced engineers who were using LLMs for coding.
That really resonates. I've found myself questioning whether I'm wasting my time writing a piece of code: what if the LLM could do this more quickly? So I try it, almost every time, and sometimes it does, sometimes it doesn't. Am I really saving myself any work in the long run? Honestly I don't know. I feel like it's just causing me to work more because it feels like a game and that is, ultimately, where the results are coming from.
Haven’t even really tried them. The sand is shifting way too fast. Once things stabilize and other people figure out how to really use them I’ll probably start but for now it just feels like effort that will have been wasted.
yeah, tbh I think that even if they are the cat's pajamas and they end up taking over absolutely all text-based work everywhere and literally everyone agrees they're better at it than humans...
... the current state-of-the-art won't be what we use, and the prompts people are spending tons of time crafting now will be useless.
so I don't think there's all that much FOMO to F over. either the hype bubble pops or literally everyone in those trades will be starting over with brand new skills based on whatever was developed in the past 6 months. people who rode the wave will have something like 6 months of advantage...
... and their advantage will quickly be put into GPTs and new users won't need to learn that either ("you are a seasoned GPT user writing a prompt..."). unless you worry endlessly about Roko's Basilisk, it's kinda ignorable I think. either way you still need to develop non-GPT skills to be able to judge the output, so you might as well focus on that.
One side effect of writing most of my code with an llm for the past six months is that I'm so much better at reading code I didn't write. It's also forced me to split my code up in smaller more modular files to make it easier for me to understand, which has the happy side effect of having a better designed system. It's counter intuitive but I think I've become a better programmer for it because it's forced me to level up my weaknesses.
All the models feel a bit different to use, and part of being good with LLMs (I suspect) is being able to assess a model before you really start using it, and, learning the nuances in the models that you will use, for that alone I think it's worth spending time with them.
"All the screwdrivers feel a bit different to use, and part of being good with screwdrivers (I suspect) is being able to assess a screwdriver before you really start using it, and, learning the nuances in the screwdrivers that you will use, for that alone I think it's worth spending time with them"
Sounds dubious to me
"All the SSRIs feel a bit different, and part of being good with SSRIs (I suspect) is being able to assess an SSRI before you really start using it, and, learning the nuances in the SSRIs that you will use, for that alone I think it's worth spending time with them"
Hm, that comparison sounds off, but not as much to me as to many other people.
"All the IDEs and text editors feel a bit different to use, and part of being good with IDEs (I suspect) is being able to assess an IDE before you really start using it, and, learning the nuances in the editors that you will use, for that alone I think it's worth spending time with them"
Sounds reasonable.
Substituting the subject back to AI coding agents, I'm struggling to make out your argument. What method of assessment would you recommend other than _starting to use_ a coding assistant model?
I guess you were referring to anecdats and reviews/posts, or were you referring to specific objective properties like context size, RAG capabilities etc?
This exactly mirrors my experience. I can't see the whole LLM/GPT thing as anything but another blockchain level scam. It isn't zero value it is actually a negative value as the time it takes is an opportunity cost.
One thing to openly recognise is that FOMO is one of the core marketing strategies applied in any hype bubble to get people on board. There seem to be multiple blog posts a day on HN that are thinly veiled marketing about AI and most follow a predictable pattern: (a) start by implying a common baseline that is deliberately just beyond where your target market sits (example: "how I optimised my Claude workflow") and (b) describe the solution to the problem just well enough to hint there's an answer but not well enough to allow people to generalise. By doing this you strongly hint that people should just buy into whatever the author is selling rather than try to build fundamental knowledge themselves.
Putting aside the FOMO, the essential time tested strategy is simply to not care and follow what interests you. And the progress in AI is simply astonishing, it's inherently interesting, this shouldnt be hard. Don't go into with it with the expectation of "Unless it vibe coded and entire working application for me on it's a failure". Play with it. Poke it, prod it. Then try to resolve the quirks and problems that pop up. Why did it do that? Don't expect an outcome. Just let it happen. The people who do this now will be the ones to come through the hype bubble at the end with actual practical understanding and deployable skills.
In a non-smug kind of way sometimes I just wonder if they types of problems I work on are just harder (at least for an LLM) than a lot of people.
Currently working at a FAANG on some very new tech, have access to all the latest and greatest but LLMs / agents really do not seem adequate working on absolutely massive codebases on entirely new platforms.
Maybe I will have to wait a few years for the stuff I'm working on to enter the mass market so the LLMs can be retrained on it.
I do find them very very useful as advanced search / stack overflow assistants.
> I’m in a state where I can’t reconcile my own results with other people’s results. I hear people saying “this hammer is indestructible”, but when I pick it up, it’s just origami: made of paper, intricate, delicate, very cool-looking but I can’t even hammer a tomato with it.
This is a really interesting signal to me. It's almost indisputable that you can get good results (I get good results pretty consistently) and so there's definitely something there. I don't think that folks who don't get good results are doing something "wrong" so much as not understanding how to work with the model to get good results.
If I was at a company building these tools, the author would be the person I'd want to interview. I doubt it's a skill issue. And it's definitely not user error. You can't sell a tool that is said to do something but the user can't replicate the result.
A tool that works but only after you've invested lots of time working to reverse engineer it in your head isn't a good tool, even if it's extremely powerful. The tool needs to be customizable and personalizable and have safety rails to prevent bad results.
I learned that the core of science and engineering is the ability to understand and control the systems that we build. This obviously involves complex tools, that need to be mastered too.
Modern AI seems to be able to achieve results bypassing this basic principle, like magic: what can go wrong?
So i cannot take seriously the gods in the art of the prompt claiming that they can watch TV while the code writes itself. But i believe that those who are already good in their domain can do a better job with such powerful tools, when they can master them too.
I spend a fair amount of time on open source and one thing I noticed is that in real pieces of software it doesn't look like all these 10x and 100x AI engineers are anywhere to be found.
VLC has like 4000 open issues. Why aren't the AI geniuses fixing these? Nobody has ever any actual code to show, and if they do it's "here's an LED that blinks every time my dog farts, I could've never done it on my own!". I'm feeling like Charlie in that episode of It's Always Sunny with his conspiracy dashboard. All these productivity gurus don't actually exist in the real world.
Can anybody show me their coding agent workflow on a 50k LOC C codebase instead of throwaway gimmick examples? As far as I'm concerned these things can't even understand pointers
This is something I've been thinking about too. I think lower skill devs seem to feel the most ecstatic about LLMs as are able to prompt something into existence that is beyond their skill level, and that feels like magic. It very quickly becomes a crutch, however, and a junior developer may struggle to become a senior in skill level with too much dependency on Cursor and tools like it. They definitely lose their edge on complex products.
I find it can be a great time saver for language reference and for rapid fire questions, but it isn't able to, as of yet, operate at an expert level on a complex novel codebase.
Maybe it gets there in another year or two. But it currently makes enough mistakes at that level that it isn't worth it.
You only have to look at the C# GitHub repos where MS enabled Copilot, and the utter ridiculousness of those interactions between the likes of Stephen Toub and the bot. It's comical.
How is anyone just supposed to know that? It's not hard to find vim, but no one says, "You need to be running this extra special vim development branch where people are pushing vim to the limits!" Yes, it's fragmented, and changing fast, but it's not reasonable to expect people just wanting a tool to be following the cutting edge.
I agree that it might not be reasonable to expect people to keep up with the latest.
For this specific thing (LLM-assisted coding), we are still in nerd territory where there are tremendous gains to be had from keeping up and tinkering.
There's a lot of billions dollars being invested to give devs who don't want to do this the right tools. We aren't quite there yet, largely because the frontier is moving so fast.
I made my original comment because it was so far from my experience, and I assumed it was because I am using a totally different set of tools.
If somebody really doesn't want to be left behind, the solution is to do the unreasonable: read hacker news everyday and tinker.
Personally, I enjoy that labor. But it's certainly not for everybody.
I agree with your comment, but I also chuckled a bit, because Neovim _is_ a fast changing ecosystem with plugins coming out to replace previous plugins all the time, and tons of config tweakers pushing things to the limit. That said… one does not have to replace their working Neovim setup just because new stuff came out. (And of course, minimalist vim users don't use any plugins!)
That's what people always seemed to say about emacs, that you haven't used it unless you've learned 300 incredibly complicated key bindings and have replaced half the program with a mail reader.
> Using, like, gpt-4o is extremely not useful for programming
I disagree! It can produce great results for well defined tasks. And I love the “I like this idea, now implement it in VSCode” flow ChatGPT desktop provides on macOS.
4o on ChatGPT.com vs. Opus in an IDE is like cooking food without kitchen tools vs. using them. 4o is neither a coding-optimized model nor a reasoning model in general.
You're not pushing them hard enough if you're not seeing a vast difference between 4o and Opus. Or possibly they're equivalent in the field you're working in but I suspect it's the former.
I'm of the same opinion as the op and I'm programming in Rust(api) and Swift(client) right now.
I entered a vibe coded game hackathon a few months back and in a little over a week I was at 25k lines of code across both the client and server. It all worked (multiplayer) even though the game sucked.
I think the wide variance in responses here is explainable by tool preference and the circumstance of what you want to work on. You might also have felt "behind" not knowing or wanting to use Dreamweaver, or React, or Ruby on Rails, or Visual Studio + .NET, all tools that allowed developers at the time to accelerate their tasks greatly. But you'll note that probably most programmers today who are successful never learned those tools, so the fact that they accelerated certain tasks didn't result in a massive gap between users and non-users.
People shouldn't worry about getting "left behind" because influencers and bloggers are overindexing on specific tech rather than more generalist skills. At the end of the day the learning curve on these things is not that steep - that's why so many people online can post about it. When the need arises and it makes sense, the IDE/framework/tooling du jour will be there and you can learn it then in a few weeks. And if past is prologue in this industry, the people who have spent all their time fiddling with version N will need to reskill for version N+1 anyways.
In the end, the greatest use I get from coding agents and stuff is hijacking the Stack Overflow principle - it's much easier to trick myself into correcting the poor code Claude generates than it is to start writing code from a blank slate.
Not all LLMs are created equal and can be used for different aspects and workflows. I think the important bit is to gravitate to the tool that gives you the closest to your expected outcome.
Copy pasting code into ChatGPT gives me unreliable results, because I don’t know how to talk to it. I find copilot annoying in Visual Studio. The workflow of cursor/windsurf is not intuitive for me. Gemini context is wicked but I didn’t like how it constructed its outputs. I tried Claude Code and became a believer, not because it’s perfect, but because it fits my cadence and allows me to pivot quickly.
I still use chat for the occasional MSSQL syntax check, or Gemini for writing a technical doc. But there are so many variables and trial and error involved.
I keep reading posts like this thinking that there is a flavour of fear holding people back. Nothing to fear, just level set your expectations, don’t believe the hype and be open to the vibes.
Well, it takes a while to learn Vim and then get value from it.
It also takes a while to learn using an LLM and get value from it.
The keys are how to build prompts, ways of working, and guidelines that help the AI stay focused.
You end up spending much more time guiding and coaching rather than coding, that can take a while to get used to.
Eventually though, you will master it and be able to write secure, fast code far beyond what you could have done by yourself.
Note: Also, prep yourself for incoming hate every time you make claims like that! If you write bad code, it's your fault. If your LLM writes bad code, you're a moron! hah
So you're taking an easy task, formal logic, and replacing it with a more difficult and time consuming task, babysitting a random number generator. How is that a net-positive?
I get your position, and I don't want to sound dismissive, but when you really learn how to manage an LLM for a complex piece of software far beyond what you have time for, you see the benefits.
> LLM for a complex piece of software far beyond what you have time for, you see the benefits.
Are LLMs the new Agile/Scrum?
"Once you really learn Scrum, it will solve all world problems and clean your house? Your experience is totally different? Skill issue. Try again."
I get your position and don't want to sound dismissive either, however I want to point out that in the only recent study actually trying to measure the productivity gains of LLMs it was observed that there is an actual 19% reduction of gains for experienced developers when using an LLM.
I asked an LLM to tell me why it "thinks" you observe an increase of productivity while studies show that for experienced developers it's a decrease, and it came up with the following "ideas":
"LLMs Fill Gaps in Knowledge Instantly. Junior developers often face friction from:
- Unfamiliar syntax
- Unclear documentation
- Uncertainty about best practices"
Again, I don't want to sound dismissive, but have you considered that instead of people not seeing the gains you are talking about due to a skills issue with how to fine prompt LLMs, that it's youseeing gains you wouldn't otherwise had you been more skillful?
If knowledge and experience isn't an issue, then LLMs will benefit the programmer less in that space, but are still useful for doing mundane activities you avoid doing, like pivoting an early idea about an API pathing strategy, and have the LLM do the test case pivot for you.
If knowledge and experience in the language is an issue, then LLMs have increased value as they can teach you language notation as well as do the mundane stuff.
If understanding good programming architecture / patterns is an issue, then you have to be more careful with the LLM as you are listening to advice from something that doesn't understand what you really want.
If understanding how to guide an LLM is an issue, then you have to work, test, and design ways of building guidelines and practices that get the outcomes you want.
Using LLMs to code isn't some cheat-code to success, but it does help greatly with the mundane parts of code if you know how to program, and program well.
How much of a large project is truly innovation? Almost every application has boilerplate code wrapped around it, error handling, CRUD endpoints, Web UI flows, all stuff you have to do and not really the fun stuff at the core of your project.
This is where I find LLMs shine, they help you burn through the boring stuff so you can focus more on what really delivers value.
This corroborates my observation that the people I've seen most excited about LLMs are management types who know how to code but not really. They love to talk about how productive LLMs make them. I think the alleged increased productivity is just in their heads.
There are always gonna be those business that claim that LLMs/Scrum/NFTs/Crypto/web3 is the next coming of Jesus, and that's normal, their survival depends on that. There are also gonna be developers swearing by XYZ, either because it genuinely worked for them (causation is not correlation) or because they don't want to appear clueless in an industry that's vibe based and tells you that if you don't succeed using shiny tool XYZ, you are doing it wrong.
The same way it was modern in some companies to hire a dedicated Scrum master for a full time position within a team, I already can imagine companies having job openings for an "Expert LLM-prompter" (or, to make it sound more legit: "Staff LLM-prompt engineer"). Take it from the funny side: the companies obsessed about productivity and efficiency will most likely be staffed with 100s of scrum masters and 100s of prompt engineers and probably one or two guys that actually do the real work. That's kind of hilarious, you gotta admit.
The emperor has had no clothes for quite some time already, but vibes do be vibing, that's the society we live in. Don't worry, you are not alone in finding the hype hard to bear. I don't think LLMs will become irrelevant as fast or as much like crypto/web3/nfts and Meta's metaverse did, but to me the amount of shouting and clothes tearing for what's essentially a fancy autocomplete (fight me on this) is just such a turn off.
> devs had hardly any experience with using AI tools.
I love it, here come the "you are using it wrong" arguments!
I thought these tools are so great, so awesome, that even without much experience they will help you? Or are they not so great and you actually gotta spend a considerable amount of time of learning them to see some return? Which one is it:
- are LLMs groundbreaking and democratizing development making it so much easier (which doesn't correspond to the results of the study)
- or do they need months of practice to give a modest return (or loss) of productivity back?
"You are using it wrong" is the cheapest cop-out when somebody questions the productivity benefits of LLMs. I'd like the LLM fanbase community to come up with better arguments (or ask their little assistant for assistance).
>I thought these tools are so great, so awesome, that even without much experience they will help you?
I never made that claim, I don't think it's generally true, but I do think it can be true for some people.
Regarding the study, I pointed out why I thought it was limited in its scope. If someone did a study on the potential productivity gains of using emacs and they based the study on people who had only used it for a week it wouldn't be a particularly useful study. A Piano is not a useful tool for making music for someone who has only used it for a week either.
I do have a customer with no prior programming experience that has created custom software for his own small manufacturing business automating tedious tasks. I think that's pretty amazing and so does he.
I have lost a few friends to llm sycophancy induced psychosis wherein they believe, and are encouraged to believe by the llm that, they are the sole individuals who have cracked "prompting", and, in fact, by so doing, have summoned the singularity for their sole benefit and under their sole control.
> A Prominent OpenAI Investor Appears to Be Suffering a ChatGPT-Related Mental Health Crisis, His Peers Say "I find it kind of disturbing even to watch it."
In my experience, AI assistance can't really do what you can't. Sure, you can force it, but the result will be a discardable mess.
Yesterday I used Gemini and Claude cli to create a symbolic regression library in C++ with added state of the art features (hobby). Gemini is there to assist when Claude runs out of context. It took me six hours. I cannot imagine how anyone could do this without being an expert in these fields.
The initial code generation was pretty much on the spot, with a few subtle bugs here and there, few inefficient choices.
As I made them add features, there were so many bad decisions that would break the initial interface simplicity, add shared states, constantly switch between SoA and AoS without a plan, introduce circular deps for no reason. I fought back, keeping code concise and functional, removed randomly introduced oop features, added low hanging optimizations. I hand fixed some of the compilation errors and warnings myself to prevent them from using up unnecessary context like a taxi driver who takes you to the long route.
How would anyone who doesn't know what they are doing accomplish all this? They would have to learn.
Sure, the bar is lower. This is a task that would have taken me a few weekends of hard work. And mind you this is a hobby project. I would've lost a significant amount of hair if I was fighting AI assistance while under pressure at work.
Yes, this matches my experience - generated slop may help you realize what you don't understand very well yet and somewhat accelerate your learning, but you won't cut any shortcuts this way. You will only manage to do it once you actually understand how to do it, as only at that point you know how to guide the LLM to success.
At my day job (not going to name the company) they have been pushing AI coding, and frankly I have found it to be essentially useless.
As a simple test, I used Claude under GitHub Copilot to develop a simple program to blink an LED using a task under zeptoforth (my Forth implementation for ARM Cortex-M, which is on GitHub so it should have been in their training set) on a Pimoroni Pico Plus 2... which took a few hours to complete because it was constantly hallucinating my API's and getting minor points wrong that I had to tell it to fix... for it to only screw something else up the next time around... repeatedly.
Then the next day they had us do an "agentive AI training" where we were supposed to use "agentive AI" to program example stuff... and I found it worked to a point ─ until it didn't work, and I could never actually make it do what I really wanted it to no matter how hard I "vibed". It just hit a stopping point where no amount of prompting would make it work.
And then there's the actual use cases for AI at my day job ─ I have not seen any that apply to me. I write only limited amounts of code ─ I do not churn out the boilerplate that AI is sold as obviating ─ and the code I do write has to be correct, for values of correct where simply looking correct is not sufficient (and an AI won't know the underlying math behind things or how to avoid things like race conditions).
If I actually used AI I would be spending more time trying to understand its output and having to fix it than I actually spend developing code, and I would have a harder time trusting the resulting code. I know how my code works because I wrote it ─ the same would not be true if I had an AI write my code for me.
Sure, there are people at my day job who have gotten fully on-board with the AI thing, and who see it as being useful for things like, say, changing the C++ standard for a whole project at a time. However, I would not be able to trust the project after that point myself. Yes, it would take much more time to manually update the code, but then it would be much easier to trust afterwards, and would require far less fixing to handle all the cases where the AI screwed up.
confirmation bias, preconceived notions etc. I've noticed many who claim the uselessness of these coding/llm are more often than not experienced engineers.
People i've talked to who are not coders have all found these coding tools to be very useful, enough to be worth paying for (approx. $20 bucks a month).
I think it depends a lot on what you're trying to get them to do and what you're working on. I don't really buy into the whole "prompt engineering" part. I use agents quite a lot, but they have limitations. They excel at writing SQL, kubernetes configuration, bicep and similar... at least so long as you know what you're doing. I also have quite a lot of success with them keeping documentation and tests updated. I wouldn't recommend using them for your tests unless tests are a nice to have feature for you because you actively deal with corrupted states through explicit error handling, but we do.
In my experience every model absolutely sucks at doing any form of software engineering. The thing is though, you don't need to do any software engineering to write software... until you do. Since LLM's got decent at programming we've let them loose among our BI and solar plant engineers and they're building some really useful tools in Python. They have no idea what they are doing, and they are still getting stuff to work. It's obviously absolute horse shit, but for a lot of it, the shit code will be "good enough" for nobody having to touch it for it's lifecycle. For the rest, actual software engineers take over.
If you're working in a field where "good enough" is fine, then I imagine that LLM's can do a lot for you. It might sound like I'm being disrespectful, but I really don't mean to be. In my decades of experience with software in enterprise settings, I've seen so many programs build with horrible code just trundle along. Creating great value for the business while requiring no to minimal maintenance. Bad from a traditionalist software engineering perspective, but great from a business perspective (which is what matters). Not everything needs to scale after all. Anyway, I do question whether the author has actually worked with LLM's a lot. I would think that the strengths failings of LLM's would get obvious rather quickly to anyone who uses them.
> Bad from a traditionalist software engineering perspective, but great from a business perspective (which is what matters). Not everything needs to scale after all. Anyway, I do question whether the author has actually worked with LLM's a lot. I would think that the strengths failings of LLM's would get obvious rather quickly to anyone who uses them.
Few things I've seen here.
> If you're working in a field where "good enough" is fine, then I imagine that LLM's can do a lot for you.
There's some specific timesheet software used for Gov Contractors that's hands-down, the worst web "application" I've ever used. Opening devTools on one of these pages is a riot--one of the "APIs" used return HTML documents with js file extensions so the browser fails to parse, every JS file is 1K lines long with blocks of global variables with various naming conventions, using 'back' in the browser just sort of fails and returns you to the sign-in page.
Any modern web programmer would have a stroke looking at this thing. Apparently it's the standard software used for this purpose, relied on all over the industry.
At a previous employer, I had to deal with some internal site for financial transactions and billing. Same story, ancient looking and performing web 'app' that had specific incantations to get it to actually work "don't refresh the page until that loads", piles of JS vomiting bizarre errors in the console, just a remarkable piece of junk.
Anyway, short story long, there's LOADS of software that's terrible by any metric that have crucial purposes floating around that are far worse in terms of 'code quality' than any LLM would spit out.
Also, I have more experience than I like to remember dealing with 'body shop' codebases. There is no way that an LLM is even capable of generating such low quality code. I legitimately think several of the projects I saw were sabotage; there's no way a human would deliberately write such inscrutable junk. Ever seen random python runtime errors cause the type of something would change depending on which branches were hit?
I would argue that "good enough" means that it doesn't add inconvenience or significant cost, but I know what you're talking about. That is the flip-side of it though, there is already so much non-AI garbage that it sometimes doesn't even registrer with the business when something was written by a MBA and some LLM. Not because that is necessarily a good thing, but it is what it is.
Nothing we do that is user facing is created by AI (this doesn't include any of our business intelligence as that's another department). We currently have a frontend lead who passionately hates AI for basically all of the reasons you bring up, and as long as they keep delivering, I doubt that will change anytime soon. As far as the BI goes, well, that will probably always be a horrible mess because it's next to impossible to find someone who's interested in working with the development/design side of PowerBI. I'm not sure it's even because we won't pay them enough, there doesn't seem to be anyone around here who wants a job like that. We tell our collective selves that it'll be better when we switch to Microsoft Fabric, but we'll see.
I think back to the first version of ChatGPT and I would pick it up once in a while, ask it something or chat with it, and then be like... this is cool but I don't know wtf I would use it for, now I use a GPT at least a couple of times a day. Granted, the LLMs have obviously become considerably more capable, but I do believe part of it is I've also learned how to use them and what to use them for, I'm at the point now where I can generally predict the output of what I'm asking for - I don't know if that's the norm (it mostly gives me exactly what I want) I do know how I use them today and how I used them when they first came out is quite different. I guess all that is to say, imo how you prompt them really matters, and that takes time to learn.
I'm going to properly write up my experiences with LLMs at some point, but some observations so far:
0. It's undeniable that LLMs are both amazing and fun. So is a clown at a children's birthday party, but you wouldn't try to ride the balloon horse,
1. If you are producing code in a language you are not familiar with, you need to be aware of the Gell-Mann amnesia effect[0],
2. It excels at language translation. This is what LLMs were originally designed for. So if you can clearly describe an algorithm in English, it will probably be able to translate that to Python, for example. But at that point you have to question why you didn't just write it in Python in the first place,
3. It gamifies things and causes you to work longer and harder. I'm not convinced it's really more efficient, but I am convinced I simply wouldn't have bothered to put in the effort to do certain things before because the perceived effort of writing code from scratch is higher than spinning a roulette wheel that might shit out the answer first time,
4. It stops you thinking for yourself. Ever worked with an incredibly knowledgable colleague who is always eager to help? Or got the feeling you've solved something before so spent time grepping through old codebases and bash history to find what you did last time instead of spending the exact same time doing it again? It's kinda like that except it knows everything and it's always there. I fear that ability will atrophy with extended use,
5. I don't feel like I really own the code it outputs. It feels more like working in a large existing codebase written by a team. When I have to get into things and understand the code it's like I'm reading it for the first time, even though I did check all the diffs etc. It's like having a manic junior run rampant then coming to you when something really needs to be understood. Again, it's the question of is this really saving me any time in the long run?
I agree that LLM are best at providing good auto completion, and even more for someone who types slowly like me. They are also good for providing examples and simple programs for technologies I'm unfamiliar with, something that I was using Google search for when the web and the search results where not just SEO crap.
"But anything beyond writing a simple function always leads to useless junk."
If that's the article writer's experience then they are simply using it incorrectly. The author seems to suggest most of their usage involves pasting code or chatting via the text interface, likely with a non-sota model.
It would be surprising to find that someone using e.g. Claude Code cli with full access to a codebase and carefully considered prompts is always getting useless junk.
> I can spend an entire day with a GPT and produce nothing of value.
Although I believe you, I think this is due to improper use. I know this sounds like a "you're holding phone wrong" moment, but I found many uses for AI that I feel addicted to what its providing me.
Like you I started with Vim long ago, in an attempt to reduce the thought -> code cycle. But after many years I started seeking something even faster, and LLMs seem to be it for me.
From a quicker navigation of the codebase, to quickly writing the boilerplate for me, adding debugging statements where I need. Believe me there is a big jump in productivity if you use it right. You just gotta keep trying until you develop a sense of what you can ask it to do and what it would be good for
I think the issue is ChatGPT. I find it to be the worst for writing shippable code beyond a single function. It plays too many mind games. If I’m trying to get any non-trivial amount of code done, it’s Gemini Pro 2.5 all the way.
> "anything beyond writing a simple function always leads to useless junk."
A strong statement like this was a reasonable take six months ago, perhaps. But with Claude Opus 4.1, o3-codex/gpt5, and Gemini Pro 2.5 this statement is pretty simply disproven with half an hour with them.
Just last week I spent an afternoon rewriting an old Android app that was done in butterknife and got something functional and tested in a few hours. It involved an 8,000 line diff. The work had been quoted to me by a contractor as likely to take 3-4 months and cost tens of thousands of dollars.
I designed a new website for my wife's band with a custom player widget, carousel photos, fully adaptive to large and small screens, good accessibility features, built from easy to edit JSON, SEO-optimized with microformats, and fast-loading. The first version I got running in 15 minutes. The final polished version took a few more days of ducking onto my laptop a few minutes here and there during an event. Without AI this would have taken me weeks to pull off and wouldn't have looked nearly as nice.
IMO part of the key to the flow here is to avoid a temptation to one shot. First establish ground rules for working together in your AGENTS.md/CLAUDE.md that lays out your software engineering best principles (use git semantics, write out your plans, add lint and tests to commit hooks...). Then have it read through existing code and architecture. Then carefully put together an architecture and set of constraints into your README/PRD. Then build a high level engineering plan divided up into sequences tasks and write it down (vs just keeping in context). Only then do you allow any code to start to get written. And yes, you still need to supervise it (and iterate on your AGENTS/CLAUDE to avoid repeating yourself). But yeah, it's about 10-100x faster than me now with a flow like that. (Context on me: been programming 40 years, Stanford CS, ACM international programming competition finalist.)
This new flow is extremely fun and addictive. It's also a new thing that uses technical knowledge but isn't exactly the same thing as coding. It's like having a pool of hard working inexpensive idiot savant never-worked-in-industry interns at your beck and call - the more you pour into giving them context and process, the better results you are gonna get.
As someone who's been "vibe-coding" successfully at work I can say that what made me a believer in AI tools is agentic instructions. It is the difference between not being able to accomplish much and actually getting stuff done. There have been several articles published on this topic on HN. I would really recommend implementing that in your repositories.
This isn’t a post, this is just a random opinion. Also, if they can code a small web app faster than an llm can I’ll eat my hat. I’d like to see them try and write the 2000+ lines required to get it up and running in less than 30 seconds.
Seems like their attitude is exactly what you’d expect of someone who wants it to be useless from the outset. A self fulfilling hypothesis.
I hate AIs because I think overall they're bad for the environment, centralizes power, and worsen quality.
That said I never really doubted that they can help you get more done. I use it occasionally to refactor or generate some boilerplate. But very often though I need to rewrite most of the functionality, I find it more true the more code it generates.
The agentic stuff I've tried have been terrible. Yeah I personally haven't used the latest models, but I deal with teammates who use them and the frontend it generates looks bland and the shell scripts generated with them are very hard to debug but also just flat out broken.
You sound like me. I've found some good uses for them generating ansible files for example (the output is regularly wrong, even with the latest and greatest model, but easy enough to correct). But otherwise, there always seems to be a cutoff in the complexity of a project where the ROI of just having to have built everything myself, and thus being completely immersed in the codebase, seems like it would have been much greater then prompting something to frankensteinian life.
But most of all, I love the act of programming and cannot imagine myself giving that over to an LLM. "Managing" an LLM is to me a soul crushing - maybe anti-intellectual, but definitely boring - task. And if that is what we as professionals will be required to do in the future, I'll just change profession.
I rarely use LLMs nowadays. Most of the time I'm fixing difficult to debug issues spanning multiple projects and a variety of backend systems and there's just no easy way to plug them to all this spaghetti. Autocomplete also spews out complete BS and was the first thing I disabled.
I did find they very useful when writing completely new stuff (things like "write a <insert your favorite API" client or making test boilerplate ie. as a copy&paste replacement tool.
It'd be nice to hear in detail how its been useful for other devs. There's too much propaganda around on how amazing it is and not nearly enough use cases.
Is it possible these companies with their vast budgets pay to have all this drivel about how useful they are plastered on the web to pump up their value? I've started to learn what useful things LLMs can do for me but self sufficient coders they ain't.
> several of them are written by well-known developers who’ve been around for over a decade
All of whom had greater output in terms of volume and impact before LLMs. And it is always the same people who now get voted to the top as opposed to a couple of months ago, when one of them got ratio'd for one of his exuberant blog posts.
It is worse than that. I'm backed by the thriving $2 trillion anti-LLM industry! Admittedly it is an unfair fight since the pro-LLM people have no backing.
We are straying a bit off-topic with this, but driving a standard can actually be a lot of fun. I have an automatic right now, but I have always enjoyed driving a standard more than automatics. This next point may be due to the quality of my particular automatic, but I have found passing vehicles on the highway to be a lot faster in a standard, because I can choose to just drop into a lower gear and hit the gas, I don't have to hit the gas and then wait for the vehicle to decide that it should go into a lower gear. So, it doesn't have to be a practical skill or a sign of arrogance, someone might just enjoy driving a standard as I do.
I recently started a company in another field and haven’t done any real development for about 4 years.
Earlier this summer I took a vacation and decided to start a small software hobby project specific to my industry. I decided to try out Cursor for the first time.
I found it incredibly helpful at saving time implementing all the bullshit involved in starting a new code base - setting up a build system, looking up libraries and APIs, implementing a framework for configuration and I/O, etc.
Yes, I still had to do some of the hard parts myself, and (probably most relevant) I still had to understand the code it was writing and correct it when it went down the wrong direction. I literally just told Cursor “No, why do it that way when you could do it much simpler by X”, and usually it fixed it.
A few times, after writing a bunch of code myself, I compiled the project for the first time in a while and (as one does) ran into a forest of inscrutable C++ template errors. Rather than spend my time scrolling through all of them I just told cursor “fix the compile errors”, and sure enough, it did it.
Another example - you can tell it things like “implement comparison operators for this class”, and it’s done in 5 seconds.
As the project got more complicated, I found it super useful to write tests for behaviors I wanted, and just tell it “make this test pass”. It really does a decent job of understanding the codebase and adding onto it like a junior developer would.
Using an IDE that gives it access to your whole codebase (including build system and tests) is key. Using ChatGPT standalone and pasting stuff in is not where the value is.
It’s nowhere near able to do the entire project from scratch, but it saved me from a bunch of tedious work that I don’t enjoy anyway.
Seems valuable enough to me!
I haven't had this much fun programming since I was at university hacking away on sun workstations, but admittedly I only write about 10% of the code myself these days.
I'm currently getting Claude Code to pair program with GPT-5 and they delegate the file edits to Gemini Flash. It's pretty cool.
I'm glad you're having fun. I wish I was having fun. I've had mixed results but quite often, for me, the GPT goes off the rails and does something completely stupid. It then doubles down. It's like talking to complete idiot who happens to have moments of lucidiy. If I get lucky and get a lucid response it's great. But so often I get the nonsense response. I mean literally as bad as the "there 3 letter Bs in the word blueberry" answer eariler today (https://news.ycombinator.com/item?id=44832908) accept code instead of spelling. I wish I knew what I was doing wrong.
- add to your global settings a few lines that say something like "don't write any code until you fully understand my requirements, ask questions until you are completely clear", this mostly stops it just going ahead and writing best guess just the first time. And then,
- when it looks it's going off the rails, cut your loses straight away and ask it to write a prompt for another chat. Tell it to cover everything it knows about what you're working on and what you want to do. Paste that into the new chat and get it to ask if it needs clarification (and then if needed, paste between the two chats). The new chat often solves the problem the other one was stuck on first time.
Hope they help - coding with AI has made me enjoy the process once again. I'm actually completely silly projects I've always wanted to do because I can get past the blockers so much more easily.
I would put the bits about clarifying questions in your personal (machine level) claude.MD file so it applies across projects.
- I prime the context of the LLM with any extra rules or instructions I want it to follow.
- I also prime the context with every single file it needs to see in order to be able to do its job properly. Sometimes this can be two files but sometimes it can be 15 files.
- I instruct the LLM in bullet point format for smaller edits. For larger edits I ask the llm to convert my prompt to markdown. Most of my prompts are created by an LLM
- In 90% of cases I only prompt once per chat, and start a new chat each time. Unless I'm using an agent like Claude Code in which case I use one context window per feature.
- I keep notes of small things that it's done not quite how I want so I can go back and change them.
- I use a program called RepoPrompt to make all this faster and easier.
I'm working on making a game right now and this was the last prompt I did last night. The prompt itself was written by Opus and the implementation was done by GPT-5 High. The actual file edits were performed by Gemini 2.5 Flash to save on API costs.
--- Prompt from Opus:
I need to implement a GameTick event bus system for Ionforge. This will add:
1. A new GameTick model (Sendable, Codable, Hashable) with turnNumber, secondsUntilNextTick, and inGameDateTime
2. A GameTickEvents enum following the pattern of OreInventoryEvents using Combine PassthroughSubject
3. A GameTimerService protocol with tickPublisher, start(), and stop()
4. A MockGameTimerService that generates ticks every 600 seconds using Task.sleep
5. Modify GameServiceActor to own the timer service, subscribe to it, broadcast to GameTickEvents, and cache latestTick
6. Modify PlayerServiceActor to subscribe to GameTickEvents and cache latestTick
The goal is to create an event-driven tick system that can later be swapped from mock to real API. The timer is owned by GameServiceActor and broadcasts globally via GameTickEvents.
Please implement these changes following the existing patterns in the codebase, particularly matching the OreInventoryEvents pattern for the event bus.
--- Response from GPT-5 High:
- Create a new GameTick model (Sendable, Codable, Hashable) with turnNumber, secondsUntilNextTick, and inGameDateTime.
- Add a GameTickEvents enum mirroring the existing OreInventoryEvents pattern (Combine PassthroughSubject).
- Define a GameTimerService protocol that exposes a Combine publisher and start/stop controls.
- Implement MockGameTimerService using Task.sleep to emit ticks every 600 seconds.
- Update GameServiceActor to:
- Update PlayerServiceActor to subscribe to global GameTickEvents and cache the latest tick.- Keep edits targeted and minimal using delegate edit scopes, with proper Combine imports.
---
GPT-5 then outputted about 120 lines of code across 7 files that compiled and worked on the first try.
I guess I could do those things, but I'd rather work on my game.
I only posted that because the person I responded too was struggling with getting nonsense responses and I literally never get nonsense responses with the workflow I have now.
https://news.ycombinator.com/item?id=44854637
... which, rather than "spellcasting", appears to contain a fairly complete technical description of the system being constructed.
- "I prime the context of the LLM with any extra rules or instructions I want it to follow."
What are "extra" rules - extra to what? How could I tell if I am doing this or not doing this?
- "I instruct the LLM in bullet point format for smaller edits. For larger edits I ask the llm to convert my prompt to markdown. Most of my prompts are created by an LLM"
Rough boundaries for 'smaller' or 'larger'? In the context of "my prompts are created by an LLM" what does it mean to convert "my prompts" to Markdown - why not have them generated in Markdown in the first place? Why bring Markdown into this at all? Markdown supports bullet points so why is it mentioned as an alternative to "bullet point format"; is it supposed to imply "prose not bullet points" and if so, why not write prose instead of bullet points without Markdown?
What difference does any of this make, either measurably or in vibes? Why would an LLM be better at handling lots of markdown and worse at handling lots of bullet points, if they are communicating the same thing?
Why create prompts by LLM at all? Presumably you have to start with inputting what you want to happen, so why pass it through two LLMs instead of one?
- "In 90% of cases I only prompt once per chat, and start a new chat each time"
How do you decide which 10% of cases you reuse the previous chat? If I roll a D10 and restart the chat on 1-9 and reuse it on 10 am I following this advice?
- "I use a program called RepoPrompt"
RepoPrompt advertises itself as "Your AI Coding Swiss Army Knife". Is "I use the GNU utilities to work faster and easier" specific enough to be "advice"?
> "They took the trouble to actually give a full prompt."
Did they take the trouble? They say Claude Opus made the prompt. The important bit - how did they get Opus to write the prompt without them writing all those things in the meta-prompt - is missing. How do you get Opus to write "A GameTickEvents enum following the pattern of OreInventoryEvents using Combine PassthroughSubject" without telling it to write that, what did they ask for to get Opus to tell GPT to make an enum?
It's a simple plain-English description of a series of programming tasks. If you can't evaluate this, you can't program.
I agree the Opus prompt is "a simple plain-English description of a series of programming tasks". Where did it come from? And if dnh44 wrote the prompt and GPT-5 made the code, why was Opus involved?
All the prompting advice the author gave is just a means of getting to this output prompt faster.
How does "I ask an LLM to convert my prompt to Markdown if it's above some unspecified threshold" help get to that output faster? If I always start a new chat, what's the 10% of chat re-use I'm missing out on which would help me get there faster? What are the "extra" rules I should be sure to include?
Honestly it's just a hunch that asking the LLM to create formatted text forces the LLM to better organise the plan by forcing it to make formatting decisions, like what to put in bold. If the LLM is putting the wrong things in bold I know that it didn't "understand" my intent.
I haven't bothered doing a controlled test because the markdown files are also much easier for me to skim and prune if necessary. So even if they don't help the LLM they help me. But I do think I noticed an improvement when I started using markdown. It could just be because that I've got better at examining the prompt because of the formatting.
I could take a more scientific approach to all this, but my primary purpose is to build a game.
>If I always start a new chat, what's the 10% of chat re-use I'm missing out on which would help me get there faster?
I start the new chat so the history doesn't pollute the context. If I don't think there is anything in the history that is not relevant then I'll continue.
>What are the "extra" rules I should be sure to include?
If the LLM repeatedly does things you don't want then I add that rule. For example at the end of my CLAUDE.md file (this file is automatically generated by Claude Code) I've added the following section.
Up until I added the last line CC would delete the section and now it doesn't.Yeah that's exactly it. Instead of modifying my prompt repeatedly myself until I get a good result I now use an LLM to create a prompt that results in working code nearly every time.
The process no longer feels like a slot machine.
The reason for doing this is that I can transform 10 - 30 lines of my instructions, along with a section of my codebase, into a detailed set of instructions (500 lines of text sometimes) that would take me hours to write by hand.
Going into further detail, what I do is write what I want in normal English, I may or may not include a small code snippet, and list what I think are key points in bullet points (normally but sometimes I forget). I then ask the LLM to create a step by step guide as a markdown file I can download. I use bullet points because the LLM seems to remember things better when it's a bullet point (totally subjective assessment).
Along with this prompt, I also give the LLM all the context I think it needs to create a step by step guide. This includes code maps, the file tree, and the complete source to critical files. This part of the prompt is usually between 2000 to 6000 lines.
Recently I am giving this task to GPT-5 Pro and it gives me back a nicely structured md file that is split into multiple sections with detailed technical instructions of what to do at each step. This file is normally 200 - 500 lines long. I skim it and prune it, then pass it to Claude Code. Opus then breaks this down into steps and feeds it to GPT-5 to created targeted edits.
>The important bit - how did they get Opus to write the prompt without them writing all those things in the meta-prompt - is missing.
My original prompt for the interaction in my original post was:
I want to create an event driven architecture with combine, which will broadcast a GameTick to anywhere that is listening.
/// Represents the state of the game at a specific turn or "tick".
/// This data is expected to be received from the server in the future.
public struct GameTick {
}This way any service that may receive a state change on a tick can individually subscribe to this broadcase. Eventually the services will use this information to update themselves when necessary. For now I only want GameService and PlayerService to subscribe. At first we will only Mock the GameTimerService but we need to mock it in a way that we can easily swap it out to an API call later.
Create a step-by-step plan to implement this feature. Put your plan in a markdown file.
>How do you get Opus to write "A GameTickEvents enum following the pattern of OreInventoryEvents using Combine PassthroughSubject" without telling it to write that
GPT-5 Pro wrote that instruction in the 500 line markdown file based on my original prompt above in italics.
Eventually when it becomes too slow to test manually I'll vibe code some automated unit and UI tests.
This sounds cool, any more details or any write up on how to do something like this?
https://www.youtube.com/watch?v=JzVnXzmZweg&t
I don't think this is different with any other skill. I see the same effect in programming. I even see the same effect in writing a single program. I think this is totally a fine strategy for "smaller" programs because the "gap" is small. But as the complexity increases then that gap widens. Most of my time isn't spent writing lines of code, most of my time is spent planning and understanding. Complexity often comes from how a bunch of really simple things interact. The complexity of music is not the literal notes, it is how everything fits together. Personally, I'll take a bit more time to write those lines if it makes me quicker at solving the harder problem. I still write notes on pen and paper even if I never look at them afterwards because the act of writing does a lot to help make those things stick.
Even though calculators and Mathematica exist, a mathematics student should learn to do arithmetic, long division, calculus, etc. by hand.
Even though AI tools (and for that matter, standard libraries) exist that can do it better and faster, a computer science student should still learn to implement a linked list, etc. from scratch.
But a professional software engineer will use AI and standard libraries, just like a professional physicist will use a calculator and Mathematica.
Calculators existed when I was a kid. I still learned arithmetic without cheating. I’m sure educators will figure out how to teach students software engineering (and writing, and thinking) without cheating.
It is true that the high-level languages "won out", and now even maybe scripting languages (Python is incredibly popular), but that also doesn't mean the concerns of the time were unwarranted. Many of those concerns did come true. Which, honestly, I don't think anyone was arguing that we couldn't continue moving forward with high level languages, but rather than we'll be able to write code faster but at the expense of hardware and that this can lead to enshitification. Though no one used that exact term. I'm not convinced most professional software isn't extremely resource inefficient and there isn't a lot of enshitification.
I mean look at the article The reason learning Latin helps with English is because you start to learn underlying structures. It would still benefit your English skills even if people spoke it regularly. These are orthogonal properties. It benefited Graham because it was so powerful. But that power is getting you to think with lambda calculus.I know Lisp is a high level language, but there is more to all of this than such a simple reduction. I mean C and C++ are high level programming languages. Even Python was mentioned in that article! So if your takeaway from it was that high level languages are the winners then I think you've missed most of the story.
Though I find his article starts to lose coherence as it moves into The Blub Paradox. There's no real definition of "power" and he weirdly is conflating it with how high level a language is. Which plenty of scripting languages like Python and Perl were available at that time. So if he's saying that Lisp was the most powerful, then it isn't about how "high" level the language is, which is the metaphor he moves into. It's so weird to hear his argument about the Blub programmer looking "up" the power continuum and thinking those languages are just as powerful. I think a lot of people think other languages are more powerful than the ones they usually write it (or even know). Hell, it is usually the driving feature to learn a language.
The real reason no one writes in Lisp was already answered. No one understands it. And people will make compromises to ensure they can understand one another. This also is a self-reinforcing paradigm though...
I'll admit that coding agents seem great if you want to measure a programmer's output by the number of lines of code they generate, but writing lines of code is just a small portion of the work.
If one thinks of the history of programming, think about the IBM 1401. The first programming was done without the advantage of assembler. Once SPS was introduced, labels for locations were introduced, saving having to learn how to do this. Did one need to learn how to do it the old way to be a good programmer? Certainly not.
When Fortran and Cobol were invented, there were those that said you couldn't be a good programmer unless you first mastered machine language.
Mastering programming requires managing and overcoming complexity. Building the complex systems that we do today involves many levels of abstractions that few have mastered the totality of. Nor is it necessary. Unless you are Donald Knuth.
There was an exchange of opinions between Knuth and Dijkstra about the Goto following the letter titled "Goto Considered harmful". Knuth suggested that some of Dijkstra's later advice and opinions diverged from how Dijkstra himself had learned and practiced programming. I think this is a very common occurrence. We don't remember specifically how we learned to program, and recommend newcomers learn.
I think there are far more levels of abstractions in programming than are in music.
Wow, it can do the same thing as a derive macro, but only sometimes, and it only takes 10,000x as long and 100,000x as much power :)
I believe these two facts are connected by the financial interests of the companies that promulgate them. It's easier to believe since their financial interests are so directly entangled.
I feel bad for the author of this article. They're being led down a technological dead end. As always, the top reply to any of these posts, is a bunch of vague and hand wavy explanation as to why the OP is "doing it wrong" and how they're "getting actual value!" Never a link to the actual project or output. Probably because we could all search the produced code, find the prior examples, and once again point out that "AI" is just "copyright whitewashing."
This industry is no longer driven by engineering but by gross financial tricks.
Generally, it’s making easy stuff easier. That’s nice, but doesn’t change the game. Personally, I already know how to whip through most of the easy stuff, so the gains aren’t that large.
I like to imagine a world where the front page of HN is clogged with articles about mastering the keyboard shortcuts in your text editor and combining search with basic techniques of reading comprehension. That’s the level of productivity gains we’re talking about here.
This is the way.
I don't understand the people who do it the other way around. I want to control the executable spec and let the ai write whatever code to make it pass.
I’m always left with one weak conclusion: “I guess that guy has low standards.”
Thanks for the very eloquent explaination.
I feel that's where most people get the best value from GPTs. And that's also why ruby on rails like platforms are so popular in the first place.
Avoiding the boilerplate from the start and focusing on what matters doesn't need to go through AI, same way we didn't need to stick to Java's generators and code factories. I kinda fear we lose some of these advancements as people move away from these more elegant stacks, but also hope the pendulum balances back when the hype fades away.
Have you looked at cookiecutter or other template repos? That's my go to for small projects and it works pretty well. I'd worry the LLM would add bugs that a template repo wouldn't, as the latter is usually heavily reviewed human written code.
Don't take this as an insult, but "people who used to be full-time engineers, and are now a bit removed" are exactly the kind of people who are the very worst at evaluating LLM coding tools.
I've went through multiple phases of LLM usage for development.
GPT3.5 era: wow this is amazing, oh. everything is hallucinated. not actually as useful as I first thought
GPT4 era: very helpful as stackoverflow on steroids.
Claude 3.5 Sonnet: have it open pretty much all the time, constantly asking questions and getting it to generate simple code (in the web UI) when it goes down actually feels very old school googling stuff. Tried a lot of in IDE AI "chat" stuff but hugely underwhelmed.
Now: rarely open IDE as I can do (nearly) absolutely everything in Claude Code. I do have to refactor stuff every so often "manually", but this is more for my sanity and understanding of the codebase..
To give an example of a task I got Claude code to do today in a few minutes which would take me hours. Had a janky looking old admin panel in bootstrap styles that I wanted to make look nice. Told Claude code to fetch the marketing site for the project. Got it to pull CSS, logos, fonts from there using curl and apply similar styling to the admin panel project. Within 10 mins it was looking far, far better than I would have ever got it looking (at least without a designers help). Then got it to go through the entire project (dozens of screens) and update "explanation" copy - most of which was TODO placeholders to explain what everything did properly. I then got it to add an e2e test suite to the core flows.
This took less than an hour while I was watching TV. I would have almost certainly _never_ got around to this before. I'd been meaning to do all this and I always sigh when I go into this panel at how clunky it all is and hard to explain to people.
But random stuff like make a web app that automates this thing or make an admin panel with auto complete on these fields and caching data pulled from this table.
It is like infinity times faster on this tedious boilerplate because some of this stuff I'd just have never done before.
Or I'd have needed to get some headcount in some web dev team to do it, but I just don't need to. Not that I'd have ever actually bothered to do that anyway...
When I can't figure out something about a library I have often loaded the git repository into my IDE and checked out the version I was using, then used the IDE to study the code. Now I do that and ask the agent questions about the code like "How do I do X?" (often it sees the part in the doumentation that I missed) or "Where is the code that does Y?" It greatly accelerates the process of understanding code.
I certainly could not review all of those changes in an uninterupted hour. I'd need to test the design changes on multiple browsers, check they respond to zoom and window sizing. I'd have to read through the tests and check that they were not just nonsense and returning true to pass. There's no way I could do all that while watching TV in 1 hour.
The problem is this is classic Gell Mann Amnesia. I can have it restyle my website with zero work, even adding StarCraft 2 or NBA Jam themes, but ask it to work in a planning or estimation problem and I'm annoyed by its quality. Its probably bad at both but I don't notice. If we have 10 specializations required on an app, I'm only mad about 10℅. If I want to make an app entirely outside my domain, yeah sure it's the best ever.
It’s constantly wrong about the simplest things. Just now it was hallucinating about CSS grid properties.
I’m convinced that how good AI seems is inversely correlated to someone’s knowledge in that domain, as you also mentioned.
Still super useful in my day to day but I have to hand hold and correct it constantly.
I mean you can easily compare this to trades and construction. Would you want a house that's built in a week by cheap foreign workers that don't know what they're doing? The end result looks great on the outside, and you can always let some other cheap worker fix some issues you're having! The electricity works! Until it doesn't and a fire breaks out.
I get it - the hype is based around the quick gains you can absolutely have. The gains are insane, I have been able to be more productive at boilerplate and repetitive tasks too. But just building software isn't hard. Building software that lasts, building software that is rock solid, efficient and maintainable, that's hard!
It's sickening to me how quickly people want to throw that in the garbage. All because it saves a few $ in development time, quality is suddenly a metric that is completely ignored, rather than weighed in.
It's going to bite people in the ass.
I agree with you we should have quality standard, but I also think it's 100% inevitable that this will all go out the window, in most companies and if that happens, our asses will be bitten.
As a silly example, an LLM will happily add all sorts of animated gifs to the screen for me, even if it makes the page take three times as long to load and more difficult to use.
It's a shame to lose well-crafted experiences in favor of the lowest-common-denominator that LLMs put out, just because they're cheaper.
If I ask it to “add drag & drop”, I already know in my mind what the correct code should look like, because I’ve implemented it many times in the past. LLMs just never deliver the -code- that I want. The end result might look ok and the drag & drop will work, but the code will be atrocious and on first glance I can pick out 10-20 mistakes that I’ll have to ask it to fix. And even if I ask for a fix, I’ll never get the same code quality as hand written code. And how can I push this kind of sub-par code to an employer’s repo when I know that I can (and should) write better quality code myself. This is what I’m being paid for right?
That's a good question. Because developers can sometimes have a bit of an unhealthy love affair with their own code.
Does your employee actually care as much about code quality as much as you do?
They would probably ask what is the downside of accepting this lower quality code, given the upside you presented:
> The end result might look ok and the drag & drop will work
Which you did quickly, saving them money (in theory).
I do wonder if this is why there is such a gulf in LLM experience. If you're doing hardcore huge scale distributed systems then I can (maybe?) see why you'd think it is useless. However, that is very niche imo and most software dev work is some level (unfortunately) of virtually throwaway code. Of course, not all is - of all the ideas and experiments, some percentage is hopefully very successful and can be polished.
So perhaps LLMs are just entry level devs in general. Who can't learn. Who won't ever gain seniority. It would make sense, after all, they were trained on the Internet and Stack Overflow, which is mostly filled with entry level questions and answers.
It's good at stuff that most competent engineers can get right while also having the sort of knowledge breadth an average engineer would lack. You really need to be a domain expert to accurately judge its output in specific areas.
This to me is one of the real benefits. I can vibe code watching TV. I can vibe code in bed. I can vibe code on the plane waiting for takeoff with GitHub Copilot Agents.
Lots of coding work is interesting, but plenty is just tedious.
My current workflow has reverted to primitive copy paste into web chat (via Kagi Assistant). The friction is enough to make me put a lot of thought into each prompt and how much code context I give it (gathered via files-to-prompt from simonw).
I have little experience with frontend and web apps, so I am trying out a supervised vibe coding flow. I give most of the code base per prompt, ask for a single feature, then read the code output fully and iterate on it a few times to reduce aforementioned bad patterns. Normally I will then type it out myself, or at most copy a few snippets of tens of lines.
What doesn’t work I found is asking for the full file with changes applied already. Not only does it take a long time and waste tokens, it normally breaks/truncates/rewords unrelated code.
So far I’m happy with how this project is going. I am familiar with all the code as I have audited and typed it out nearly entirely myself. I am actually retaining some knowledge and learning new concepts (reactive state with VanJS) and have confidence I can maintain this project even without an LLM in future, which includes handing it over to colleagues :)
EDIT: Just to add context seeing other comments, I almost exclusively work in C++ on GPU drivers.
I'm building an app in my stack with fairly common requirements. There are a few code examples that cover requirements but none that cover our specific scenario. After searching the web myself, I asked 3 different AI models. All they did was regurgitate the closest public GitHub example, lacking the use case I was trying to do. Solving this problem can only be done by understanding the abstraction of the alteration in design.
These things can't actually think. And now they're allowed to be agentic.
In some ways they're just glorified search engines but there's a geopolitical sprint to see who can get them to mock "thinking" enough to fool everybody.
Out of ego and greed, everything will be turned over to this machine, and that will be the end of humanity; not humans...humanity.
People think "Oh, it works better when somebody else does it" or "There must be some model that does better than the one I am using" or "If I knew how to prompt better I'd get better results" or "There must be some other agentic IDE which is better than the one I am using."
All those things might be true but they just change the odds, they don't change the fact that it works sometimes and fails other times.
For instance I asked an agent to write me a screen to display some well-typed data. It came up with something great right away that was missing some fields and had some inconsistent formatting but it fixed all those problems when I mentioned them -- all speaking the language of product managers and end users. The code quality was just great, as good as if I wrote it, maybe better.
Plenty of times it doesn't work out like that.
I was working on some code where I didn't really understand the typescript types and fed it the crazy error messages I was getting and it made a try to understand them and didn't really, I used it as a "rubber duck" over the course of a day or two and working with it I eventually came to understand what was wrong and how to fix and I got into a place that I like and when there is an error I can understand it and it can understand it too.
Sometimes it writes something that doesn't typecheck and I tell it to run tsc and fix the errors and sometimes it does a job I am proud of and other times it adds lame typeguards like
Give it essentially the same problem, say writing tests in Java, and it might take very different approaches. One time it will use the same dependency injection framework used in other tests to inject mocks into private fields, other times it will write some a helper method to inject the mocks into private fields with introspection directly.You might be able to somewhat tame this randomness with better techniques but sometimes it works and sometimes it doesn't and if I just told you about the good times or just told you about the bad times it would be a very different story.
I have to wonder if you tried a simple google search and read through some docs if you couldn't have figured this out quicker than trying to coax a result out of the LLM.
I have had cases in which a web search and some good old fashioned thinking have yielded better results than using an LLM, but on average I’m pretty sure the LLM has the edge.
The answers are in (i) the Typescript documentation and (ii) the documentation of libraries that I'm using. I could get lucky with a Google search and it could be worth trying, but I wouldn't expect it to work. Personally my preference is to have a language and libraries with great documentation (Python, Java, Typescript isn't too bad [1]) and really know that documentation like the back of my hand.
If I hadn't had the LLM I would have probably figured it out the same way doing experiments, I might have asked my other "rubber duck"
https://mastodon.social/@UP8/113935901671533690
A tactic I didn't use, which helps in "legacy" systems where I am stuck, is start a fresh project in the IDE and try either reproducing the problem or building a tiny system which is problem free.
I'm hesitant to say what speedup I got out of the "figuring out the types together with the LLM" but emotionally I felt supported and in the process I wrote a whole lot, like I was keeping track of the process in a notebook. I feel a lot times when I have good LLM conversations I wind up writing better code than I would otherwise, not necessarily write it faster -- it's like pair programming.
[1] The typescript docs are great for the typescript stuff, MDN is good for Javascript and Javascript's stdlib
Wow. This sounds so foreign to me. Being emotionally supported by the proverbial "tin can". If that makes you happy, then I'm happy for you, I guess?
"Gambling-like behavior in pigeons: ‘jackpot’ signals promote maladaptive risky choice"
https://www.nature.com/articles/s41598-017-06641-x
The tech industry is actively promoting gambling addiction and the scary thing is that people are willingly walking into that trap.
Take a look at this comment: https://news.ycombinator.com/item?id=44849147
"Most of what I've learned from talking to people about their workflows is counterintuitive and subtle."
Seriously? Are we at the point of doing rain dances for these models and describing the moves as "counterintuitive and sublte"? This is some magical thinking level self delusion.
Downvote all you like, or ignore this. Agency is being taken away from us, no one gets to say we didn't see it coming down the line because we did and we said something and our peers treated us like ignorant and self interested for pointing out the obvious.
You are not alone: https://rgoldfinger.com/blog/2025-07-26-claude-code-is-a-slo...
> Are we at the point of doing rain dances for these models and describing the moves as "counterintuitive and sublte"
Yes, we are. LLM cultists can downvote as much as they like, but the reality is that with all the incantations and coaxing, we not only don't see a positive effect, we see a net negative effect on the quality of our software. On the ground level of OSS we are drowning in AI generated slop: https://www.theregister.com/2025/07/15/curl_creator_mulls_ni...
On the enterprise level we are creating more and worse code, faster. In the news, we are generating more and worse and factually wrong articles, faster. In our personal lives, guided by our sycophantic LLMs assistants, we offload more and more cognitive chores, becoming dumber and dumber.
LLM-produced slop is like the plastic of text: cheap and easy to use and proliferate, and then 40 years later you realize it's everywhere in the environment and in your brain, and that it's toxic to all life.
Sure, if you want to do with 100 developers what 10 can do better, cheaper and faster, don't spend time on code quality. Just vibe it. More work for the experts to fix that mess later.
Yes, by the same logic arsenic has many pros (and some cons, admittedly) in paint production, thorium is a not bad material for toothpaste (google it), and asbestos is just dandy for fire insulation (with some teeny tiny negatives for your lungs).
The fact that plastic is in the bloodstream of every living being and it's a strong endocrine disruptor and there is literally a garbage patch of it floating in our oceans is a small price to pay for the convenience and pleasure of drinking coca-cola from a lighter-than-glass bottle! Fantastic material indeed! What other fantastic materials are there? Lead for pipes? White phosphorus for matches?
With my experience, I wonder what the author of this blog post has tried to do to complete a task as that might make a difference on why they couldn't get much use out of it. Maybe other posters can chime in on how big of a difference programming language and size of project can make. I did find that it was able to glean how I had architected an app and it was able to give feedback on potential refactors, although I didn't ask it to go that far.
Prior to trying out Claude Code, I had only used ChatGPT and DeepSeek to post general questions on how to use APIs and frameworks and asking for short snippets of code like functions to do text parsing with regexes, so to be honest I was very surprised at what the state of the art could actually do, at least for my projects.
Having the AI ask me questions and think about the PRD/spec ultimately made me a better system designer.
This all feels like spinning the roulette wheel. I sometimes wonder if AI proponents are just gamblers who had the unfortunate luck of winning the first few prompts.
It seems like an idea worth exploring formally but I haven't see that done anywhere. Is this a case of "perception of winning" while one is actually losing? Or it it that the winning is in aggregate and people who like LLM-based coding are just more tolerant of the volatility to get there?
The only study I've seen testing the actual observable impact on velocity showed a modest decrease in output for experienced engineers who were using LLMs for coding.
[1] https://www-cdn.anthropic.com/58284b19e702b49db9302d5b6f135a...
My LLM codegen workflow - https://news.ycombinator.com/item?id=43094006 - Feb 2025 (160 comments)
... the current state-of-the-art won't be what we use, and the prompts people are spending tons of time crafting now will be useless.
so I don't think there's all that much FOMO to F over. either the hype bubble pops or literally everyone in those trades will be starting over with brand new skills based on whatever was developed in the past 6 months. people who rode the wave will have something like 6 months of advantage...
... and their advantage will quickly be put into GPTs and new users won't need to learn that either ("you are a seasoned GPT user writing a prompt..."). unless you worry endlessly about Roko's Basilisk, it's kinda ignorable I think. either way you still need to develop non-GPT skills to be able to judge the output, so you might as well focus on that.
Sounds dubious to me
"All the SSRIs feel a bit different, and part of being good with SSRIs (I suspect) is being able to assess an SSRI before you really start using it, and, learning the nuances in the SSRIs that you will use, for that alone I think it's worth spending time with them"
Hm, that comparison sounds off, but not as much to me as to many other people.
"All the IDEs and text editors feel a bit different to use, and part of being good with IDEs (I suspect) is being able to assess an IDE before you really start using it, and, learning the nuances in the editors that you will use, for that alone I think it's worth spending time with them"
Sounds reasonable.
Substituting the subject back to AI coding agents, I'm struggling to make out your argument. What method of assessment would you recommend other than _starting to use_ a coding assistant model?
I guess you were referring to anecdats and reviews/posts, or were you referring to specific objective properties like context size, RAG capabilities etc?
Putting aside the FOMO, the essential time tested strategy is simply to not care and follow what interests you. And the progress in AI is simply astonishing, it's inherently interesting, this shouldnt be hard. Don't go into with it with the expectation of "Unless it vibe coded and entire working application for me on it's a failure". Play with it. Poke it, prod it. Then try to resolve the quirks and problems that pop up. Why did it do that? Don't expect an outcome. Just let it happen. The people who do this now will be the ones to come through the hype bubble at the end with actual practical understanding and deployable skills.
Currently working at a FAANG on some very new tech, have access to all the latest and greatest but LLMs / agents really do not seem adequate working on absolutely massive codebases on entirely new platforms.
Maybe I will have to wait a few years for the stuff I'm working on to enter the mass market so the LLMs can be retrained on it.
I do find them very very useful as advanced search / stack overflow assistants.
This is a really interesting signal to me. It's almost indisputable that you can get good results (I get good results pretty consistently) and so there's definitely something there. I don't think that folks who don't get good results are doing something "wrong" so much as not understanding how to work with the model to get good results.
If I was at a company building these tools, the author would be the person I'd want to interview. I doubt it's a skill issue. And it's definitely not user error. You can't sell a tool that is said to do something but the user can't replicate the result.
A tool that works but only after you've invested lots of time working to reverse engineer it in your head isn't a good tool, even if it's extremely powerful. The tool needs to be customizable and personalizable and have safety rails to prevent bad results.
So i cannot take seriously the gods in the art of the prompt claiming that they can watch TV while the code writes itself. But i believe that those who are already good in their domain can do a better job with such powerful tools, when they can master them too.
VLC has like 4000 open issues. Why aren't the AI geniuses fixing these? Nobody has ever any actual code to show, and if they do it's "here's an LED that blinks every time my dog farts, I could've never done it on my own!". I'm feeling like Charlie in that episode of It's Always Sunny with his conspiracy dashboard. All these productivity gurus don't actually exist in the real world.
Can anybody show me their coding agent workflow on a 50k LOC C codebase instead of throwaway gimmick examples? As far as I'm concerned these things can't even understand pointers
I find it can be a great time saver for language reference and for rapid fire questions, but it isn't able to, as of yet, operate at an expert level on a complex novel codebase.
Maybe it gets there in another year or two. But it currently makes enough mistakes at that level that it isn't worth it.
Using, like, gpt-4o is extremely not useful for programming. But using Claude Code in your actual repo is insanely useful.
Gotta use the right tool + model.
For this specific thing (LLM-assisted coding), we are still in nerd territory where there are tremendous gains to be had from keeping up and tinkering.
There's a lot of billions dollars being invested to give devs who don't want to do this the right tools. We aren't quite there yet, largely because the frontier is moving so fast.
I made my original comment because it was so far from my experience, and I assumed it was because I am using a totally different set of tools.
If somebody really doesn't want to be left behind, the solution is to do the unreasonable: read hacker news everyday and tinker.
Personally, I enjoy that labor. But it's certainly not for everybody.
I disagree! It can produce great results for well defined tasks. And I love the “I like this idea, now implement it in VSCode” flow ChatGPT desktop provides on macOS.
You’re going to get vastly different responses if you’re using Opus versus 4o.
Frontier models seems remarkably similar in performance.
Yeah some nuances for sure, but the whole article could apply to every model.
I entered a vibe coded game hackathon a few months back and in a little over a week I was at 25k lines of code across both the client and server. It all worked (multiplayer) even though the game sucked.
I did have some great luck producing quite useful and impactful code. But also lost time chasing tiny changes.
People shouldn't worry about getting "left behind" because influencers and bloggers are overindexing on specific tech rather than more generalist skills. At the end of the day the learning curve on these things is not that steep - that's why so many people online can post about it. When the need arises and it makes sense, the IDE/framework/tooling du jour will be there and you can learn it then in a few weeks. And if past is prologue in this industry, the people who have spent all their time fiddling with version N will need to reskill for version N+1 anyways.
Copy pasting code into ChatGPT gives me unreliable results, because I don’t know how to talk to it. I find copilot annoying in Visual Studio. The workflow of cursor/windsurf is not intuitive for me. Gemini context is wicked but I didn’t like how it constructed its outputs. I tried Claude Code and became a believer, not because it’s perfect, but because it fits my cadence and allows me to pivot quickly.
I still use chat for the occasional MSSQL syntax check, or Gemini for writing a technical doc. But there are so many variables and trial and error involved.
I keep reading posts like this thinking that there is a flavour of fear holding people back. Nothing to fear, just level set your expectations, don’t believe the hype and be open to the vibes.
It also takes a while to learn using an LLM and get value from it.
The keys are how to build prompts, ways of working, and guidelines that help the AI stay focused.
You end up spending much more time guiding and coaching rather than coding, that can take a while to get used to.
Eventually though, you will master it and be able to write secure, fast code far beyond what you could have done by yourself.
Note: Also, prep yourself for incoming hate every time you make claims like that! If you write bad code, it's your fault. If your LLM writes bad code, you're a moron! hah
Try
Are LLMs the new Agile/Scrum?
"Once you really learn Scrum, it will solve all world problems and clean your house? Your experience is totally different? Skill issue. Try again."
I get your position and don't want to sound dismissive either, however I want to point out that in the only recent study actually trying to measure the productivity gains of LLMs it was observed that there is an actual 19% reduction of gains for experienced developers when using an LLM.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
I asked an LLM to tell me why it "thinks" you observe an increase of productivity while studies show that for experienced developers it's a decrease, and it came up with the following "ideas":
"LLMs Fill Gaps in Knowledge Instantly. Junior developers often face friction from:
- Unfamiliar syntax
- Unclear documentation
- Uncertainty about best practices"
Again, I don't want to sound dismissive, but have you considered that instead of people not seeing the gains you are talking about due to a skills issue with how to fine prompt LLMs, that it's you seeing gains you wouldn't otherwise had you been more skillful?
If knowledge and experience in the language is an issue, then LLMs have increased value as they can teach you language notation as well as do the mundane stuff.
If understanding good programming architecture / patterns is an issue, then you have to be more careful with the LLM as you are listening to advice from something that doesn't understand what you really want.
If understanding how to guide an LLM is an issue, then you have to work, test, and design ways of building guidelines and practices that get the outcomes you want.
Using LLMs to code isn't some cheat-code to success, but it does help greatly with the mundane parts of code if you know how to program, and program well.
How much of a large project is truly innovation? Almost every application has boilerplate code wrapped around it, error handling, CRUD endpoints, Web UI flows, all stuff you have to do and not really the fun stuff at the core of your project.
This is where I find LLMs shine, they help you burn through the boring stuff so you can focus more on what really delivers value.
The same way it was modern in some companies to hire a dedicated Scrum master for a full time position within a team, I already can imagine companies having job openings for an "Expert LLM-prompter" (or, to make it sound more legit: "Staff LLM-prompt engineer"). Take it from the funny side: the companies obsessed about productivity and efficiency will most likely be staffed with 100s of scrum masters and 100s of prompt engineers and probably one or two guys that actually do the real work. That's kind of hilarious, you gotta admit.
The emperor has had no clothes for quite some time already, but vibes do be vibing, that's the society we live in. Don't worry, you are not alone in finding the hype hard to bear. I don't think LLMs will become irrelevant as fast or as much like crypto/web3/nfts and Meta's metaverse did, but to me the amount of shouting and clothes tearing for what's essentially a fancy autocomplete (fight me on this) is just such a turn off.
I love it, here come the "you are using it wrong" arguments!
I thought these tools are so great, so awesome, that even without much experience they will help you? Or are they not so great and you actually gotta spend a considerable amount of time of learning them to see some return? Which one is it:
- are LLMs groundbreaking and democratizing development making it so much easier (which doesn't correspond to the results of the study)
- or do they need months of practice to give a modest return (or loss) of productivity back?
"You are using it wrong" is the cheapest cop-out when somebody questions the productivity benefits of LLMs. I'd like the LLM fanbase community to come up with better arguments (or ask their little assistant for assistance).
https://news.ycombinator.com/item?id=44854649
I never made that claim, I don't think it's generally true, but I do think it can be true for some people.
Regarding the study, I pointed out why I thought it was limited in its scope. If someone did a study on the potential productivity gains of using emacs and they based the study on people who had only used it for a week it wouldn't be a particularly useful study. A Piano is not a useful tool for making music for someone who has only used it for a week either.
I do have a customer with no prior programming experience that has created custom software for his own small manufacturing business automating tedious tasks. I think that's pretty amazing and so does he.
You sound exactly like them.
https://xcancel.com/GeoffLewisOrg/status/1945212979173097560
> A Prominent OpenAI Investor Appears to Be Suffering a ChatGPT-Related Mental Health Crisis, His Peers Say "I find it kind of disturbing even to watch it."
From https://futurism.com/openai-investor-chatgpt-mental-health
Yesterday I used Gemini and Claude cli to create a symbolic regression library in C++ with added state of the art features (hobby). Gemini is there to assist when Claude runs out of context. It took me six hours. I cannot imagine how anyone could do this without being an expert in these fields.
The initial code generation was pretty much on the spot, with a few subtle bugs here and there, few inefficient choices.
As I made them add features, there were so many bad decisions that would break the initial interface simplicity, add shared states, constantly switch between SoA and AoS without a plan, introduce circular deps for no reason. I fought back, keeping code concise and functional, removed randomly introduced oop features, added low hanging optimizations. I hand fixed some of the compilation errors and warnings myself to prevent them from using up unnecessary context like a taxi driver who takes you to the long route.
How would anyone who doesn't know what they are doing accomplish all this? They would have to learn.
Sure, the bar is lower. This is a task that would have taken me a few weekends of hard work. And mind you this is a hobby project. I would've lost a significant amount of hair if I was fighting AI assistance while under pressure at work.
Basically, a lot of people who are experts are being told this story and they think they are the only one who doesn't get it.
There are plenty of gains to be had with AI/LLMs but just not in the way it's typically marketed.
As a simple test, I used Claude under GitHub Copilot to develop a simple program to blink an LED using a task under zeptoforth (my Forth implementation for ARM Cortex-M, which is on GitHub so it should have been in their training set) on a Pimoroni Pico Plus 2... which took a few hours to complete because it was constantly hallucinating my API's and getting minor points wrong that I had to tell it to fix... for it to only screw something else up the next time around... repeatedly.
Then the next day they had us do an "agentive AI training" where we were supposed to use "agentive AI" to program example stuff... and I found it worked to a point ─ until it didn't work, and I could never actually make it do what I really wanted it to no matter how hard I "vibed". It just hit a stopping point where no amount of prompting would make it work.
And then there's the actual use cases for AI at my day job ─ I have not seen any that apply to me. I write only limited amounts of code ─ I do not churn out the boilerplate that AI is sold as obviating ─ and the code I do write has to be correct, for values of correct where simply looking correct is not sufficient (and an AI won't know the underlying math behind things or how to avoid things like race conditions).
If I actually used AI I would be spending more time trying to understand its output and having to fix it than I actually spend developing code, and I would have a harder time trusting the resulting code. I know how my code works because I wrote it ─ the same would not be true if I had an AI write my code for me.
Sure, there are people at my day job who have gotten fully on-board with the AI thing, and who see it as being useful for things like, say, changing the C++ standard for a whole project at a time. However, I would not be able to trust the project after that point myself. Yes, it would take much more time to manually update the code, but then it would be much easier to trust afterwards, and would require far less fixing to handle all the cases where the AI screwed up.
How can my experience be so totally different?
People i've talked to who are not coders have all found these coding tools to be very useful, enough to be worth paying for (approx. $20 bucks a month).
In my experience every model absolutely sucks at doing any form of software engineering. The thing is though, you don't need to do any software engineering to write software... until you do. Since LLM's got decent at programming we've let them loose among our BI and solar plant engineers and they're building some really useful tools in Python. They have no idea what they are doing, and they are still getting stuff to work. It's obviously absolute horse shit, but for a lot of it, the shit code will be "good enough" for nobody having to touch it for it's lifecycle. For the rest, actual software engineers take over.
If you're working in a field where "good enough" is fine, then I imagine that LLM's can do a lot for you. It might sound like I'm being disrespectful, but I really don't mean to be. In my decades of experience with software in enterprise settings, I've seen so many programs build with horrible code just trundle along. Creating great value for the business while requiring no to minimal maintenance. Bad from a traditionalist software engineering perspective, but great from a business perspective (which is what matters). Not everything needs to scale after all. Anyway, I do question whether the author has actually worked with LLM's a lot. I would think that the strengths failings of LLM's would get obvious rather quickly to anyone who uses them.
Few things I've seen here.
> If you're working in a field where "good enough" is fine, then I imagine that LLM's can do a lot for you.
There's some specific timesheet software used for Gov Contractors that's hands-down, the worst web "application" I've ever used. Opening devTools on one of these pages is a riot--one of the "APIs" used return HTML documents with js file extensions so the browser fails to parse, every JS file is 1K lines long with blocks of global variables with various naming conventions, using 'back' in the browser just sort of fails and returns you to the sign-in page.
Any modern web programmer would have a stroke looking at this thing. Apparently it's the standard software used for this purpose, relied on all over the industry.
At a previous employer, I had to deal with some internal site for financial transactions and billing. Same story, ancient looking and performing web 'app' that had specific incantations to get it to actually work "don't refresh the page until that loads", piles of JS vomiting bizarre errors in the console, just a remarkable piece of junk.
Anyway, short story long, there's LOADS of software that's terrible by any metric that have crucial purposes floating around that are far worse in terms of 'code quality' than any LLM would spit out.
Also, I have more experience than I like to remember dealing with 'body shop' codebases. There is no way that an LLM is even capable of generating such low quality code. I legitimately think several of the projects I saw were sabotage; there's no way a human would deliberately write such inscrutable junk. Ever seen random python runtime errors cause the type of something would change depending on which branches were hit?
Nothing we do that is user facing is created by AI (this doesn't include any of our business intelligence as that's another department). We currently have a frontend lead who passionately hates AI for basically all of the reasons you bring up, and as long as they keep delivering, I doubt that will change anytime soon. As far as the BI goes, well, that will probably always be a horrible mess because it's next to impossible to find someone who's interested in working with the development/design side of PowerBI. I'm not sure it's even because we won't pay them enough, there doesn't seem to be anyone around here who wants a job like that. We tell our collective selves that it'll be better when we switch to Microsoft Fabric, but we'll see.
Deltek. Yuck.
0. It's undeniable that LLMs are both amazing and fun. So is a clown at a children's birthday party, but you wouldn't try to ride the balloon horse,
1. If you are producing code in a language you are not familiar with, you need to be aware of the Gell-Mann amnesia effect[0],
2. It excels at language translation. This is what LLMs were originally designed for. So if you can clearly describe an algorithm in English, it will probably be able to translate that to Python, for example. But at that point you have to question why you didn't just write it in Python in the first place,
3. It gamifies things and causes you to work longer and harder. I'm not convinced it's really more efficient, but I am convinced I simply wouldn't have bothered to put in the effort to do certain things before because the perceived effort of writing code from scratch is higher than spinning a roulette wheel that might shit out the answer first time,
4. It stops you thinking for yourself. Ever worked with an incredibly knowledgable colleague who is always eager to help? Or got the feeling you've solved something before so spent time grepping through old codebases and bash history to find what you did last time instead of spending the exact same time doing it again? It's kinda like that except it knows everything and it's always there. I fear that ability will atrophy with extended use,
5. I don't feel like I really own the code it outputs. It feels more like working in a large existing codebase written by a team. When I have to get into things and understand the code it's like I'm reading it for the first time, even though I did check all the diffs etc. It's like having a manic junior run rampant then coming to you when something really needs to be understood. Again, it's the question of is this really saving me any time in the long run?
[0] https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect
If that's the article writer's experience then they are simply using it incorrectly. The author seems to suggest most of their usage involves pasting code or chatting via the text interface, likely with a non-sota model.
It would be surprising to find that someone using e.g. Claude Code cli with full access to a codebase and carefully considered prompts is always getting useless junk.
Although I believe you, I think this is due to improper use. I know this sounds like a "you're holding phone wrong" moment, but I found many uses for AI that I feel addicted to what its providing me.
Like you I started with Vim long ago, in an attempt to reduce the thought -> code cycle. But after many years I started seeking something even faster, and LLMs seem to be it for me.
From a quicker navigation of the codebase, to quickly writing the boilerplate for me, adding debugging statements where I need. Believe me there is a big jump in productivity if you use it right. You just gotta keep trying until you develop a sense of what you can ask it to do and what it would be good for
A strong statement like this was a reasonable take six months ago, perhaps. But with Claude Opus 4.1, o3-codex/gpt5, and Gemini Pro 2.5 this statement is pretty simply disproven with half an hour with them.
Just last week I spent an afternoon rewriting an old Android app that was done in butterknife and got something functional and tested in a few hours. It involved an 8,000 line diff. The work had been quoted to me by a contractor as likely to take 3-4 months and cost tens of thousands of dollars.
I designed a new website for my wife's band with a custom player widget, carousel photos, fully adaptive to large and small screens, good accessibility features, built from easy to edit JSON, SEO-optimized with microformats, and fast-loading. The first version I got running in 15 minutes. The final polished version took a few more days of ducking onto my laptop a few minutes here and there during an event. Without AI this would have taken me weeks to pull off and wouldn't have looked nearly as nice.
IMO part of the key to the flow here is to avoid a temptation to one shot. First establish ground rules for working together in your AGENTS.md/CLAUDE.md that lays out your software engineering best principles (use git semantics, write out your plans, add lint and tests to commit hooks...). Then have it read through existing code and architecture. Then carefully put together an architecture and set of constraints into your README/PRD. Then build a high level engineering plan divided up into sequences tasks and write it down (vs just keeping in context). Only then do you allow any code to start to get written. And yes, you still need to supervise it (and iterate on your AGENTS/CLAUDE to avoid repeating yourself). But yeah, it's about 10-100x faster than me now with a flow like that. (Context on me: been programming 40 years, Stanford CS, ACM international programming competition finalist.)
This new flow is extremely fun and addictive. It's also a new thing that uses technical knowledge but isn't exactly the same thing as coding. It's like having a pool of hard working inexpensive idiot savant never-worked-in-industry interns at your beck and call - the more you pour into giving them context and process, the better results you are gonna get.
Seems like their attitude is exactly what you’d expect of someone who wants it to be useless from the outset. A self fulfilling hypothesis.
That said I never really doubted that they can help you get more done. I use it occasionally to refactor or generate some boilerplate. But very often though I need to rewrite most of the functionality, I find it more true the more code it generates.
The agentic stuff I've tried have been terrible. Yeah I personally haven't used the latest models, but I deal with teammates who use them and the frontend it generates looks bland and the shell scripts generated with them are very hard to debug but also just flat out broken.
But most of all, I love the act of programming and cannot imagine myself giving that over to an LLM. "Managing" an LLM is to me a soul crushing - maybe anti-intellectual, but definitely boring - task. And if that is what we as professionals will be required to do in the future, I'll just change profession.
I did find they very useful when writing completely new stuff (things like "write a <insert your favorite API" client or making test boilerplate ie. as a copy&paste replacement tool.
It'd be nice to hear in detail how its been useful for other devs. There's too much propaganda around on how amazing it is and not nearly enough use cases.
The other thing, it's just another code generation tool, treat it accordingly
All of whom had greater output in terms of volume and impact before LLMs. And it is always the same people who now get voted to the top as opposed to a couple of months ago, when one of them got ratio'd for one of his exuberant blog posts.