Author here. I built this in a few hours after the Claude Code leak.
I've been working on my own coding agent setup for a while. I mostly use pi [0] because it's minimal and easy to extend. When the leak happened, I wanted to study how Anthropic structured things: the tool system, how the agent loop flows, A 500K line codebase is a lot to navigate, so I mapped it visually to give myself a quick reference I could come back to while adapting ideas into my own harness and workflow.
I'm actively updating the site based on feedback from this thread. If anything looks off, or you find something I missed, lmk.
It's it a simple REPL with some tools and integrations, written in a very high level language? How the hell is it so big? Is it because it's vibecoded and LLMs strive for bloat, or is it meaningful complexity?
For the animations specifically, it's using Motion (fka Framer Motion) Javascript library. If you describe some animations from the site to an LLM and ask it to use Framer motion, you get very similar results. The creator likely just prompted for a while until they were happy with the outcome.
Well, I assume this is all just generated with Claude Code, right? Whether there is much back and forth with the LLM is a valid question and nothing wrong with generating websites (I do it too for some side projects). Claude loves generating websites with a particular style of serif font. We also saw this with https://tboteproject.com/timeline/ and I've just generally seen it from various designs that coworkers have spit out over months using Claude defaults.
I guess I just find it weird because all the signals are messed up so whenever I see these sorts of layouts, I feel like I'm looking at the average where I don't think "gorgeous and interesting" at all. Instead, I'm forced to think "I should be skeptical of this based on the presentation because it presents as high quality but this may be hiding someone who is not actually aware of what they're presenting in any depth" as the author may have just shoved in a prompt and let it spin.
There's actually a similarly designed website (font weights, font styles etc) here in New Zealand (https://nzoilwatch.com/) where at a glance, it might seem like some overloaded professional-backed thing but instead it's just some guy who may or may not know anything about oil at all, yet people are linking it around the place like some sort of authoritative resource.
I would have way less of an issue if people just put their names by things and disclosed their LLM usage (which again, is fine) rather than giving the potentially false impression to unequipped people that the information presented is actually as accurate and trustworthy as the polish would suggest.
I was talking to one of the people who works at a big agentic coding tools. If I recall correctly, he was talking about how they use the tool to build the tool. I was complaining that all of the websites/frontends I make look pretty weak, and I'm amazed they get much slicker looking UIs with the same tool. He showed me that one way they do it is by having an extensive UI library of components/graphics/whatever, and also mentioned that the folks build their UIs know how to prompt/use the tool because it's backed by years of UI development knowledge & superior resources. I realized I didn't have any of that, and it actually made me feel better.
Last week we I was struggling to go from vague prompt to a OMG-it's-so-nice-looking web app, I remembered that example above and then decided to create my own component library, which I did in a couple days: https://www.substrateui.dev/. I was actually super happy that I was able to accomplish that, and then I realized I wanted to better understand the content that I had vibe coded into existence. So now I'm recreating that design system step by step w/ Claude code, filling in gaps in my knowledge & learning a bit about colors, typography, CSS, blah blah blah. It's actually a lot of fun because I'm able to explore all of the concepts and learn enough to build a front end that doesn't suck & is good enough for my use case without getting stuck for days on trying to center a stupid div by hand or play whack-mole-fix-something-and-break-something-else when trying to clean up AI slop.
I was referencing https://www.neobrutalism.dev/ and https://www.retroui.dev/ and slopped my way through it. A lot of it was just asking Claude Code "is this a proper design system?", then I kept doing that until it didn't have anything useful to add. Now I'm using my that as the template for understanding such things in more detail.
The people who don’t know how to use an LLM to make them more productive, or are scared it’s going to take their job, are louder than the people who are making good use of them to
make them more productive.
That just seems to be human nature unfortunately - the complainers are always louder.
What? We must have different internets, I agree in general, but the "AI is the second coming" crowd is louder than standing next to a jet on takeoff. I'm in the "AI is making me more productive but a worse developer" crowd, don't know what I count as.
I mean, tools change, but I'd be happy to hear if any tool can create that by just saying create "Claude Code Unpack" with nice graphics. or some other single prompt. It likely was an iterative process and it would be lovely if more people started sharing that, because the process itself is also very interesting.
I've created some chinese characters learning website and I took me typing 1/3 of LoTR to get there[1]. I would have typed like 1% of that writing code directly. It is a different process, but it still needs some direction.
I think it is accurate. Where are the autonomous AI who beat the creator to the punch? When we write "Hello, World!" in C and compile it with `gcc`, do we give credit to every contributor to GNU? AI is a tool that thus far only humans are capable of using with the unique inspiration. Will this change in the future? Certainly. But is it the case now? I think my questions imply some reasonable objections.
Thanks to Claude Code, we got such a beautifully polished and dazzling website that gives a complete introduction to itself the very moment the leak happened :)
I guess they really do eat their own dogfood and vibe code their way through it without care for technical debt? In a way, it’s a good challenge, but it’s fairly painful to watch the current state of the project (which is about a year old now, so it should be in prime shape).
> is about a year old now, so it should be in prime shape
A 1yo project may be in good shape if written by just one dev, maybe a few. But if you have many devs, I can guarantee it will be messy and buggy. If anything, at 1yo it is probably still full of bugs because not enough time has elapsed for people to run into them.
It's only 510k LoC, at ~100 lines of code a day for a year, this code base would take 23 engineers a year to write. That's for 220 working days in somewhere civilized.
And I'm sure we all know that when working on a greenfield project you can produce a lot more LoC per day than maintaining a legacy one.
Given that vibe code is significantly more verbose, you're probably talking about ~15 engineers worth of code?
I know that's all silly numbers, but this is just attempting to give people some context here, this isn't a massive code base. I've not read a lot of it, so maybe it's better than the verbose code I see Claude put out sometimes.
The previous poster was making out that in a year the code base would be a mess if people had done it.
This is a two-pizza team sized project, so it's not a project that the code quality would inevitably spiral out of control due to communication problems.
A single senior architect COULD have kept the code quality under control.
Which makes for an interesting thought / discussion; code is written to be read by humans first, executed by computers second. What would code look like if it was written to be read by LLMs? The way they work now (or, how they're trained) is on human language and code, but there might be a style that's better for LLMs. Whatever metric of "better" you may use.
Just a thought experiment, I very much doubt I'm the first one to think of it. It's probably in the same line of "why doesn't an LLM just write assembly directly"
LLMs read and write human-code because humans have been reading and writing human-code. The sample size of assembly problems is, in my estimate, too small for LLMs to efficiently read and write it for common use cases.
I liken it to the problem of applying machine learning to hard video games (e.g. Starcraft). When trained to mimic human strategies, it can be extremely effective, but machine learning will not discover broadly effective strategies on a reasonable timescale.
If you convert "human strategies" to "human theory, programming languages, and design patterns", perhaps the point will be clear.
But: could the ouroboric cycle of LLM use decay the common strategies and design patterns we use into inexplicable blobs of assembly? Can LLMs improve at programming if humans do not advance the theory or invent new languages, patterns, etc?
But starcraft training is not through mimicking human strategies - it was pure RL with a reward function shaped around winning, which allows it to emerge non-human and eventually super-human strategies (such as the worker oversaturation).
The current training loop for coding is RL as well - so a departure from human coding patterns is not unexpected (even if departure from human coding structure is unexpected, as that would require development of a new coding language).
> It's probably in the same line of "why doesn't an LLM just write assembly directly"
My suspicion is that the "language" part of LLMs means they tend to prefer languages which are closer to human languages than assembly and benefit from much of the same abstractions and tooling (hence the recent acquisition of bun and astral).
The problem with that is that assembly isn't portable, and x86 isn't as dominant as it once was, so then you've got arm and x86(_64). But you could target the LLVM machine if you wanted.
Yes but my point was that they seem to explicitly not care about code quality and/or the insane amount of bloat, and seem to just want the LLM to be able to deal with it.
I've heard somewhere that they have roughly 100% code churn every few months, so yes, they unfortunately don't care about code quality. It's a shame, because it's still the best coding agent, in my experience.
> they unfortunately don't care about code quality.
> It's a shame, because it's still the best coding agent, in my experience.
If it is the best, and if it delivers the value users are asking for, then why would they have an incentive to make further $$$ investments to make it of a "higher" quality if the value this difference could make is not substantial or hurts the ROI?
On many projects I found this "higher quality" not only to be false of delivering more substantial value but actually I found it was hurting the project to deliver the value that matters.
Maybe we are after all entering the era of SWE where all this bike-shedding is gone and only type of engineers who will be able to survive in it will be the ones who are capable of delivering the actual value (IME very few per project).
Yes, but as I said, it’s in a way the ultimate form of dogfooding: ideally they’ll be able to get the LLM smart enough to keep the codebase working well long-term.
Now whether that’s actually possible is a second topic.
Just finished looking at Ink here.. frontend world has no shame. Love the gloating about 40x less RAM as if that amount of memory for a text REPL even approaches defensible. "CC built CC" is not the flex people seem to suggest it is.
There's this weird thing about AI generated content where it has the perfect presentation but conveys very little.
For example the whole animation on this website, what does it say beyond that you make a request to backend and get a response that may have some tool call?
We've moved from "move fast and break things" to "hallucinate fast and patch later." It's the inevitable side effect of using AI to curate AI-written codebases.
That's fair. The site isn't meant to be a deep technical dive, it's more of a visual high-level guide of what I've curated while exploring the codebase while assisted by AI, 500k loc codebase is just too much to sift through in a short amount of time.
I agree with you and I'm generally an AI "defender" when people superficially dismiss AI capabilities, but this is a more subtle point.
If you prompt with little raw material and little actual specification of what you want to see in the end, eg you just say make a detailed breakdown dashboard-like site that analyzes this codebase, the result will have this uncanny character.
I'd describe it as a kind of "fanfic", it (and now I'm not just talking about this website but my overall impression related to this phenomenon) reminds me a bit like how when I was 15 or so, I had an idea about how the world works then things turned out to be less flashy, less movie-like, less clear-cut, less-impressive-to-a-teenage-boy than I had thought.
If you know the concept of "stupid man's idea of a smart man", I'd say AI made stuff (with little iteration) gives this outward appearance of a smart man from the Reddit-midwit-cinematic-universe. It's like how guns in movies sound more like guns than real guns. It's hyperreality.
Again this is less about the capabilities of AI and it's more connected to the people-pleasing nature of it. It's like you prompt it for some epic dinner and it heaps you up some hmmm epic bacon with bacon yeah (referring to the hivemind-meme). Or BigMac on the poster vs the tray, and the poster one is a model made with different components that are more photogenic. It's a simulacrum.
It looks more like your naive currently imagined thing about what you think you need vs what you'd actually need. It's like prompting your ideal girlfriend into AI avatar existence. I'm sure she will fit your ideal thought and imagination much better but your actual life would need the actual thing.
This relates to the Persona thing that Anthropic has been exploring, that each prompt guides the model towards adopting a certain archetypal fiction character as it's persona and there are certain attraction basins that get reinforced with post training. And in the computer world, simulated action can be easily turned into real action with harnesses and tools, so I'm not saying that it doesn't accomplish the task. But it seems that there are more sloppy personas, and it seems that experts can more easily avoid summoning them by giving them context that reflects more mundane reality than a novice or an expert who gives little context. Otherwise the AI persona will be summoned from the Reddit midwit movie.
I'm not fully clear about all this, but I think we have a lot to figure out around how to use and judge the output of AI in a productive workflow. I don't think it will go away ever, but will need some trimming at the edges for sure.
Kairos and auto-dream are more interesting than anything in the agent loop section. Memory consolidation between sessions is the actual unsolved problem. The rest is just plumbing tbh
I think it's good that it's out there, and I wonder why Anthropic have been keeping it closed source; clearly they can't possibly think that the CC source code is a competitive advantage...?
Agents in general are easy to make, and trivial to make for yourself especially, and the result will be much better than what any of the big providers can make for you.
`pi` with whatever commands/extensions you want to make for yourself is better than CC if you really don't want to go through the trouble of making your own thing.
I feel the same way. Given it's AI-written, looking at the code isn't even worth it to me. I would rather read a blog post about how they develop it day to day.
I doubt there is anything special about the transformer code the frontier labs use. The only thing proprietary in it are probably the infrastructure-specific optimizations for very large scale distributed training and some GPU kernel tricks. The real moat is the training data, especially the RLHF/finetuning data and verifiable reward environments, and the GPU clusters of course.
The open source models are quite close, and they'd probably be just as good with the equivalent amount of compute/data the frontier labs have access to.
However, I assume that usage data could be increasingly valuable as well. That will likely help the big commercial cloud models to maintain a head start for general use.
/stickers:
Displays earned achievement stickers for milestones like first commit, 100 tool calls, or marathon sessions. Stickers are stored in the user profile and rendered as ASCII art in the terminal.
That is not what it does at all - it takes you to a stickermule website.
What is the motivation for someone to put out junk like this?
The animated explanation at the top is also way too fast at 1x, almost impossible to follow; that immediately hinted at the author not fully reading/experiencing the result before publishing this.
Really nice visualisation of this, makes understanding the flow at a high levle pretty clear. Also the tool system and command catalog, particularly the gated ones are super interesting.
519K lines of code for something that is using the baseline *nix tools for pretty much everything important, how do they even manage to bloat it this much? I mean I know how technically, but it's still depressing.
Can't they ask CC to make it good, instead of asking it to make it bigger?
I mean, I get it: vibe-coded software deserves vibe-coded coverage. But I would at least appreciate it if the main part of it, the animation, went at a speed that at least makes it possible to follow along and didn't glitch out with elements randomly disappearing in Firefox...
It's on the front page because it looks really cool. You can complain about it being vibe coded, but it still looks good. If you ask Claude to allow the user to slow down the animation, it can do that quite easily, that's just not a problem caused by vibe coding. And I'm on FF and didn't notice anything glitching out.
Thanks, I'll use this for teaching next week (on what not to do). BashTool.ts :D But, in general, I guess it just shows yet again that the emperor has no clothes.
I've been working on my own coding agent setup for a while. I mostly use pi [0] because it's minimal and easy to extend. When the leak happened, I wanted to study how Anthropic structured things: the tool system, how the agent loop flows, A 500K line codebase is a lot to navigate, so I mapped it visually to give myself a quick reference I could come back to while adapting ideas into my own harness and workflow.
I'm actively updating the site based on feedback from this thread. If anything looks off, or you find something I missed, lmk.
[0] https://pi.dev/
It's it a simple REPL with some tools and integrations, written in a very high level language? How the hell is it so big? Is it because it's vibecoded and LLMs strive for bloat, or is it meaningful complexity?
I guess I just find it weird because all the signals are messed up so whenever I see these sorts of layouts, I feel like I'm looking at the average where I don't think "gorgeous and interesting" at all. Instead, I'm forced to think "I should be skeptical of this based on the presentation because it presents as high quality but this may be hiding someone who is not actually aware of what they're presenting in any depth" as the author may have just shoved in a prompt and let it spin.
There's actually a similarly designed website (font weights, font styles etc) here in New Zealand (https://nzoilwatch.com/) where at a glance, it might seem like some overloaded professional-backed thing but instead it's just some guy who may or may not know anything about oil at all, yet people are linking it around the place like some sort of authoritative resource.
I would have way less of an issue if people just put their names by things and disclosed their LLM usage (which again, is fine) rather than giving the potentially false impression to unequipped people that the information presented is actually as accurate and trustworthy as the polish would suggest.
I'm serious. The hype chasing clearly clearly matters. .
things like this: https://github.com/instructkr/claw-code I mean ok, serious people put in years of effort for 100 of those stars ...
it's continually wild how extremely irrelevant hard effortful careful work is.
I think that's the game. Get up, look at the headlines, figure out how you can exploit them with vibe coding, do some hyphy project and repeat.
Personally, I don't think I will be putting any such disclaimers or disclosures on my work, unless I deem it relevant to the functionality.
Content resizing, needing to juggle a speed knob to read, and the overall presentation makes it feel like Edward Tufte flavored nightmare fuel.
Last week we I was struggling to go from vague prompt to a OMG-it's-so-nice-looking web app, I remembered that example above and then decided to create my own component library, which I did in a couple days: https://www.substrateui.dev/. I was actually super happy that I was able to accomplish that, and then I realized I wanted to better understand the content that I had vibe coded into existence. So now I'm recreating that design system step by step w/ Claude code, filling in gaps in my knowledge & learning a bit about colors, typography, CSS, blah blah blah. It's actually a lot of fun because I'm able to explore all of the concepts and learn enough to build a front end that doesn't suck & is good enough for my use case without getting stuck for days on trying to center a stupid div by hand or play whack-mole-fix-something-and-break-something-else when trying to clean up AI slop.
That just seems to be human nature unfortunately - the complainers are always louder.
I've created some chinese characters learning website and I took me typing 1/3 of LoTR to get there[1]. I would have typed like 1% of that writing code directly. It is a different process, but it still needs some direction.
1. https://hanzirama.com/making-of
A 1yo project may be in good shape if written by just one dev, maybe a few. But if you have many devs, I can guarantee it will be messy and buggy. If anything, at 1yo it is probably still full of bugs because not enough time has elapsed for people to run into them.
And I'm sure we all know that when working on a greenfield project you can produce a lot more LoC per day than maintaining a legacy one.
Given that vibe code is significantly more verbose, you're probably talking about ~15 engineers worth of code?
I know that's all silly numbers, but this is just attempting to give people some context here, this isn't a massive code base. I've not read a lot of it, so maybe it's better than the verbose code I see Claude put out sometimes.
This is a two-pizza team sized project, so it's not a project that the code quality would inevitably spiral out of control due to communication problems.
A single senior architect COULD have kept the code quality under control.
Just a thought experiment, I very much doubt I'm the first one to think of it. It's probably in the same line of "why doesn't an LLM just write assembly directly"
I liken it to the problem of applying machine learning to hard video games (e.g. Starcraft). When trained to mimic human strategies, it can be extremely effective, but machine learning will not discover broadly effective strategies on a reasonable timescale.
If you convert "human strategies" to "human theory, programming languages, and design patterns", perhaps the point will be clear.
But: could the ouroboric cycle of LLM use decay the common strategies and design patterns we use into inexplicable blobs of assembly? Can LLMs improve at programming if humans do not advance the theory or invent new languages, patterns, etc?
The current training loop for coding is RL as well - so a departure from human coding patterns is not unexpected (even if departure from human coding structure is unexpected, as that would require development of a new coding language).
My suspicion is that the "language" part of LLMs means they tend to prefer languages which are closer to human languages than assembly and benefit from much of the same abstractions and tooling (hence the recent acquisition of bun and astral).
> It's a shame, because it's still the best coding agent, in my experience.
If it is the best, and if it delivers the value users are asking for, then why would they have an incentive to make further $$$ investments to make it of a "higher" quality if the value this difference could make is not substantial or hurts the ROI?
On many projects I found this "higher quality" not only to be false of delivering more substantial value but actually I found it was hurting the project to deliver the value that matters.
Maybe we are after all entering the era of SWE where all this bike-shedding is gone and only type of engineers who will be able to survive in it will be the ones who are capable of delivering the actual value (IME very few per project).
Now whether that’s actually possible is a second topic.
That's how you get "oh this TUI API wrapper needs 68GB of RAM" https://x.com/jarredsumner/status/2026497606575398987 or "we need 16ms to lay out a few hundred characters on screen that's why it's a small game engine": https://x.com/trq212/status/2014051501786931427
Also I definitely want a Claude Code spirit animal
(Yes, I know I can turn it off. I have.)
“Complete thyself.”
And I want an octopus. Who orchestrates octopuses.
This deployment is temporarily paused
https://web.archive.org/web/20260331105051/https://www.cclea...
BTW, that's why you should use your own infrastructure and not depend on Vercel
i do shift ctrl F
For example the whole animation on this website, what does it say beyond that you make a request to backend and get a response that may have some tool call?
If you prompt with little raw material and little actual specification of what you want to see in the end, eg you just say make a detailed breakdown dashboard-like site that analyzes this codebase, the result will have this uncanny character.
I'd describe it as a kind of "fanfic", it (and now I'm not just talking about this website but my overall impression related to this phenomenon) reminds me a bit like how when I was 15 or so, I had an idea about how the world works then things turned out to be less flashy, less movie-like, less clear-cut, less-impressive-to-a-teenage-boy than I had thought.
If you know the concept of "stupid man's idea of a smart man", I'd say AI made stuff (with little iteration) gives this outward appearance of a smart man from the Reddit-midwit-cinematic-universe. It's like how guns in movies sound more like guns than real guns. It's hyperreality.
Again this is less about the capabilities of AI and it's more connected to the people-pleasing nature of it. It's like you prompt it for some epic dinner and it heaps you up some hmmm epic bacon with bacon yeah (referring to the hivemind-meme). Or BigMac on the poster vs the tray, and the poster one is a model made with different components that are more photogenic. It's a simulacrum.
It looks more like your naive currently imagined thing about what you think you need vs what you'd actually need. It's like prompting your ideal girlfriend into AI avatar existence. I'm sure she will fit your ideal thought and imagination much better but your actual life would need the actual thing.
This relates to the Persona thing that Anthropic has been exploring, that each prompt guides the model towards adopting a certain archetypal fiction character as it's persona and there are certain attraction basins that get reinforced with post training. And in the computer world, simulated action can be easily turned into real action with harnesses and tools, so I'm not saying that it doesn't accomplish the task. But it seems that there are more sloppy personas, and it seems that experts can more easily avoid summoning them by giving them context that reflects more mundane reality than a novice or an expert who gives little context. Otherwise the AI persona will be summoned from the Reddit midwit movie.
I'm not fully clear about all this, but I think we have a lot to figure out around how to use and judge the output of AI in a productive workflow. I don't think it will go away ever, but will need some trimming at the edges for sure.
I use it all day and love it. Don't get me wrong. But it's a terminal-based app that talks to an LLM and calls local functions. Ooookay…
Agents in general are easy to make, and trivial to make for yourself especially, and the result will be much better than what any of the big providers can make for you.
`pi` with whatever commands/extensions you want to make for yourself is better than CC if you really don't want to go through the trouble of making your own thing.
curious as i haven't gotten around to writing my own agent yet
But you can do a lot of interesting things on top of this. I highly recommend writing an agent and hooking it up to a local model.
The open source models are quite close, and they'd probably be just as good with the equivalent amount of compute/data the frontier labs have access to.
However, I assume that usage data could be increasingly valuable as well. That will likely help the big commercial cloud models to maintain a head start for general use.
The utils directory should only contain truly generic, business-agnostic utilities (such as date retrieval, simple string manipulation, etc.).
We can see that the code produced by Vibe is not what a professional engineer would write. This may be due to the engineers using the Vibe tool.
First command I looked at:
That is not what it does at all - it takes you to a stickermule website.What is the motivation for someone to put out junk like this?
Getting something with a link to their GitHub onto the frontpage of HN. Because form matters much more in this world than substance.
The animated explanation at the top is also way too fast at 1x, almost impossible to follow; that immediately hinted at the author not fully reading/experiencing the result before publishing this.
It's inappropriate to label a free side project 'junk' or 'slop' even if it contains major errors.
Particularly when there's a disclaimer about possible inaccuracies on the page.
0 - https://github.com/zackautocracy/claude-code/blob/main/src/u...
it looks really interesting.
How is this on the front page?
- find nothing - still manage to fill entire lages - somehow have a similar structure - are boring as fuck
At least this one is 3/4, the previous one had BINGO.
War flashbacks to genshin
In all seriousness. I think you‘re supposed to run these in some kind of sandbox.
Which emperor, specifically?