NHacker Next
login
▲Snorting the AGI with Claude Codekadekillary.work
299 points by beigebrucewayne 1 days ago | 212 comments
Loading comments...
mjrbrennan 13 hours ago [-]
Not trying to be rude here, but that `last_week.md` is horrible to me. I can't imagine having to read that let alone listen to the computer say it to me. It's so much blah blah and fluff that reads like a bad PR piece. I'd much rather scan through commits of the last week.

I've found this generally with AI summaries...usually their writing style is terrible, and I feel like I cannot really trust them to get the facts right, and reading the original text is often faster and better.

never_inline 11 hours ago [-]
Here's a system prompt I tend to use

    ## Instructions
    * Be concise
    * Use simple sentences. But feel free to use technical jargon.
    * Do NOT overexplain basic concepts. Assume the user is technically proficient.
    * AVOID flattering, corporate-ish or marketing language. Maintain a neutral viewpoint.
    * AVOID vague and / or generic claims which may seem correct but are not substantiated by the the context.
Cannot completely avoid hallucinations and it's good to avoid AI for text that's used for human-to-human communication. But this makes AI answers to coding and technical questions easier to read.
WD-42 12 hours ago [-]
I felt the same thing about the onboarding. Like what future are we trying to build for ourselves here, exactly? The kind where instead of sitting down with a coworker to learn about a codebase, instead we get an ai generated PowerPoint to read alone????

Im so over this timeline.

JohnMakin 12 hours ago [-]
all of this just reads like the supposed UML zeitgeist that was supposed to transform java and eliminate development 20 years ago

if this is all ultimately java but with even more steps, its a sign im definitely getting old. it’s just the same pattern of non technical people deceiving themselves into believing they dont need to be technical to build tech and then ultimately resulting in again 10-20 years of re-learning the painful lessons of that.

let me off this train too im tired already

TeMPOraL 6 hours ago [-]
The mistake was going after programmers, instead of going after programming languages, where the actual problem is.

UML may be ugly and in need of streamlining, but the idea of building software by creating and manipulating artifacts at the same conceptual level we are thinking at any given moment, is sound. Alas, we've long ago hit a wall in how much cross-cutting complexity we can stuff into the same piece of plaintext code, and we've been painfully scraping along the Pareto frontier ever since, vacillating between large and small functions and wasting time debating merits of sum types in lieu of exception handling, hoping that if we throw more CS PhDs into category theory blender, they'll eventually come up with some heavy duty super-mapping super monad that'll save us all.

(I wrote a lot on it in in the past here; c.f. "pareto frontier" and "plaintext single source of truth codebase".)

Unfortunately, it may be too late to fix it properly. Yes, LLMs are getting good enough to just translate between different perspectives/concerns on the fly, and doing the dirty work on the raw codebase for us. But they're also getting good enough that managers and non-technical people may finally get what they always wanted: building tech without being technical. For the first time ever, that goal is absolutely becoming realistic, and already possible in the small - that's what the whole "vibe coding" thing heralds.

WD-42 30 minutes ago [-]
I’ve heard this many times before but I’ve never heard an argument that rebukes the plain fact that text is extremely expressive, and basically anything else we try to replace it with less so. And it happens that making a von Neumann machine do precisely what you want requires a high level of precision. Happy to understand otherwise!
bigiain 11 hours ago [-]
20 years before UML/Java it was "4th Generation Languages" that were going to bring "Application Development Without Programmers" to businesses.

https://en.wikipedia.org/wiki/Fourth-generation_programming_...

rsynnott 4 hours ago [-]
> all of this just reads like the supposed UML zeitgeist that was supposed to transform java and eliminate development 20 years ago

See also 'no-code', 4GLs, 5GLs, etc etc etc. Every decade or so, the marketers find a new thing that will destroy programming forever.

Agentlien 9 hours ago [-]
Of all the things I read at uni UML is the thing I've felt the least use for - even when designing new systems. I've had more use for things I never thought I'd need like Rayleigh scattering and processor design.
baq 9 hours ago [-]
UML was a buzzword, but a sequence diagram can sometimes replace a few hundred words of dry text. People think best in 2d.
rsynnott 4 hours ago [-]
Sure, but you're talking "mildly useful", rather than "replaced programmers 30 years ago, programmers don't exist anymore".

(Also, I'm _fairly_ sure that sequence diagrams didn't originate with UML; it just adopted them.)

bryanrasmussen 7 hours ago [-]
>People think best in 2d.

no they don't. some people do. Some people think best in sentences, paragraphs, and sections of structured text. Diagrams mean next to nothing to me.

Some graphs, as in representations of actual mathematical graphs, do have meaning though. If a graph is really the best data structure to describe a particular problem space.

on edit: added in "representations of" as I worried people might misunderstand.

TeMPOraL 6 hours ago [-]
FWIW, you're likely right here; not everyone is a visual thinker.

Still, what both you and GP should be able to agree on, is that code - not pseudocode, simplified code, draft code, but actual code of a program - is one of the worst possible representations to be thinking and working in.

It's dumb that we're still stuck with this paradigm; it's a great lead anchor chained to our ankles, preventing us from being able to handle complexity better.

baq 4 hours ago [-]
IMHO there's usually a lot of necessary complexity that is irrelevant to the actual problem; logging, observability, error handling, authn/authz, secret management, adapting data to interfaces for passing to other services, etc.

Diagrams and pseudocode allow to push those inconveniences into the background and focus on flows that matter.

TeMPOraL 4 hours ago [-]
Precisely that. As you say, this complexity is both necessary and irrelevant to the actual problem.

Now, I claim that the main thing that's stopping advancement in our field is that we're making a choice up front on what is relevant and what's not.

The "actual problem" changes from programmer to programmer, and from hour to the next. In the morning, I might be tweaking the business logic; at noon, I might be debugging some bug across the abstraction layers; in the afternoon, I might be reworking the error handling across the module, and just as I leave for the day, I might need to spend 30 minutes discussing architecture issue with the team. All those things demand completely different perspectives; for each, different things are relevant and different are just noise. But right now, we're stuck looking at the same artifact (the plaintext code base), and trying to make every possible thing readable simultaneously to at least some degree.

I claim this is a wrong approach that's been keeping us stuck for too long now.

eadmund 5 hours ago [-]
> code - not pseudocode, simplified code, draft code, but actual code of a program - is one of the worst possible representations to be thinking and working in.

It depends on the language. In my experience, well-written Lisp with judicious macros can come close to fitting the way I think of a problem. But some language with tons of boilerplate? No, not at all.

TeMPOraL 4 hours ago [-]
As a die-hard Lisper, I still disagree. Yes, Lisp can go further than anything else to eliminate boilerplate, but you're still locked in a single representation. The moment you switch your task into something else - especially something that actually cares about the boilerplate you hidden, and not the logic you exposed - and now you're fighting an even harder battle.

That's what I mean by Pareto frontier: the choices made by various current-generation languages and coding methodologies (including choices you as a macro author makes, too), are all promoting readability for some tasks, at the expense of readability for other tasks. We're just shifting the difficulty around the time of day, not actually eliminating it.

To break through that and actually make progress, we need to embrace working in different, problem-specific views, instead of on the underlying shared single-source-of-truth plaintext code directly.

bryanrasmussen 6 hours ago [-]
ok, but my reference to sentences, paragraphs and sections would not indicate code but rather documentation.
bryanrasmussen 7 hours ago [-]
oops, evidently I got downvoted because I don't think best in 2d and that is bad, classy as always HN.
quietbritishjim 7 hours ago [-]
I think most software engineers need to draw a class diagram from time to time. Maybe there are a lot of unnecessary details to the UML spec, but it certainly doesn't hurt to agree that a hollow triangle for the arrow head means parent/child while a normal arrow head means composition, with a diamond at the root for ownership.

As the sibling comment says, sequence diagrams are often useful too. I've used them a few times for illustrating messages between threads, and for showing the relationship between async tasks in structured concurrency. Again, maybe there are murky corners to UML sequence diagrams that are rarely needed, but the broad idea is very helpful.

fennecfoxy 4 hours ago [-]
True but I don't bother with a unified system, just a mermaid diagram. I work in web though, so perhaps if I went back to embedded (which I did only a short while) or something else when a project is planned in it entirety rather than growing organically/reacting to customers needs/trends/the whims of management.
quietbritishjim 32 minutes ago [-]
I just looked at Mermaid and it seems to as close to UML as I meant by my previous comment. Just look at this class diagram [1]: triangle-ended arrows for parent/child, the classic UML class box of name/attributes/methods, stereotypes in <<double angle brackets>>, etc. The text even mentions UML. I'm not a JS dev so tend to use PlantUML instead - which is also UML based, as the name implies.

I'm not sure what you mean by "unified system". If you mean some sort of giant data store of design/architecture where different diagrams are linked to each other, then I'm certainly NOT advocating that. "Archimate experience" is basically a red flag against both a person and the organisation they work for IMO.

(I once briefly contracted for a large company and bumped into a "software architect" in a kitchenette one day. What's your software development background, I asked him. He said: oh no, I can't code. D-: He spent all day fussing with diagrams that surely would be ignored by anyone doing the actual work.)

[1] https://mermaid.js.org/syntax/classDiagram.html

91bananas 3 hours ago [-]
I've been at this 16 years. I've seen one planned project in that 16 years that stuck anywhere near the initial plan. They always grow with the whims of someone.
KronisLV 5 hours ago [-]
> I think most software engineers need to draw a class diagram from time to time.

Sounds a lot like RegEx to me: if you use something often then obviously learn it but if you need it maybe a dozen or two dozen times per year, then perhaps there’s less need to do a deep dive outside of personal interest.

fennecfoxy 4 hours ago [-]
Lmao I remember uni teaching me UML. Right before I dropped out after a year because fuck all of that. It's a shame because some of the final year content I probably would've liked.

But I just couldn't handle it when I got into like COMP102 and in the first lecture, the lecturer is all "has anybody not used the internet before?"

I spent my childhood doing the stuff so I just had to bail. I'm sure others would find it rewarding (particularly those that were in my classes because 'a computer job is a good job for money').

CuriouslyC 4 hours ago [-]
Naw, the new future (technically the present for orgs that use AI intelligently) is:

The AI already generated comprehensive README.md files and detailed module/function/variable (as needed) doc comments, which you could read but end up mostly being consumed by another AI, so you can just tell it what you're trying to do and ask it how you might accomplish that in the codebase, first at a conceptual level, then in code once you feel comfortable enough with the system to be able to validate the work.

All the while you're sitting next to another coworker who's also doing the same thing, while you talk about high level architecture stuff, make jokes, and generally have a good time. Shit, I don't even mind open offices as much as I used to, because you don't need that intense focus to get into a groove to produce code quickly like you did when manually writing it, so you can actually have conversations with an entire table of coworkers and still be super productive.

No comment on the political/climate side of this timeline, but the AI part is pretty good when you master it.

mvieira38 12 minutes ago [-]
What kind of stuff are you building where that is even remotely possible? I get that generating documentation works fine, but building features just isn't there yet for non-trivial apps, and don't even get me started on trying to get the agents to backtrack and change something they did
mjrbrennan 10 hours ago [-]
Yes that's what gets me too. I want to engage with my coworkers, you know other humans? And get their ideas and input and summaries. Not just sit in my office alone having the computer explain everything to me badly, or read through Powerpoints of all things...
TeMPOraL 6 hours ago [-]
> I want to engage with my coworkers, you know other humans?

I.e. the very species we try to limit our contact with, which is why we chose this particular field of work? Or are you from the generation that joined software for easy money? :).

/s, but only partially.

There are aspects of this work where to "engage with my coworkers" is to be doing the exact opposite of productive work.

crucialfelix 9 hours ago [-]
Usually the tricks and problems in a codebase are not in the codebase at all, they are in somebody's head.

It would be helpful if I had a long rambling dialogue with a chat model and it distilled that.

dwringer 8 hours ago [-]
> It would be helpful if I had a long rambling dialogue with a chat model and it distilled that.

IME this can work pretty well with Gemini in the web UI. If it misinterprets you at any stage you can edit your last comment until it gets on the same page, so to speak. Then once you're to a point in the conversation where you're satisfied it seems to "get it", you can drop in some more directly relevant context like example code if needed and ask for what you want.

fennecfoxy 4 hours ago [-]
Yup, you can always tell LLMs just from the ridiculous output most of the time. Like 8-20 sentences minimum, for the most basic thing.

Even Gemini/gpt4o/etc are all guilty of this. Maybe they'll tighten things up at some point - if I ask an assistant a simple question like "is it possible to put apples into a pie?" what I want is "Yes, it is possible to put apples into a pie. Would you like to know more?"

But not "Yes, absolutely — putting apples into a pie is not only possible, it's classic! Apple pie is one of the most well-known and traditional fruit pies. Typically, sliced apples are mixed with sugar, cinnamon, nutmeg, and sometimes lemon juice or flour, then baked inside a buttery crust. You can use various types of apples depending on the flavor and texture you want (like Granny Smith for tartness or Honeycrisp for sweetness). Would you like a recipe or tips on which apples work best?" (from gpt4).

8 hours ago [-]
TeMPOraL 8 hours ago [-]
If this was meant to be read, I might've agreed, but:

1) This was supposed to be piped through TTS and listened to in the background, and...

2) People like podcasts.

Your typical podcast is much worse than this. It's "blah blah" and "hahaha <interaction>" and "ooh <emoting>" and "<irrelevant anecdote>" and "<turning facts upside down and injecting a lie for humorous effect>", and maybe some of the actual topic mixed in between, and yet for some reason, people love it.

I honestly doubt this specific thing would be useful for me, but I'm not going to assume it's plain dumb, because again, podcasts are worse, and people love it.

xandrius 5 hours ago [-]
What kind of podcast have you listened to, if any?

They aren't all Joe Rogan.

TeMPOraL 4 hours ago [-]
Name one that isn't > 90% fluff and human interaction sounds.
dghlsakjg 6 minutes ago [-]
Conversations with Tyler Cowen, Complex Systems with patio11 are two off the top of my head that concentrate on useful information, and certainly aren't "> 90% fluff and human interaction sounds".

Unless of course people talking in any capacity is human interaction sounds, in which case, yes, every podcast is > 90% human interaction sounds.

block_dagger 12 hours ago [-]
You can specify desired style in the prompt. The author seems to like PR sounding fluff while making morning coffee.
fullstackchris 12 hours ago [-]
Yeah I was done at "What happened here was more than just code..." -_-
jsjohnst 12 hours ago [-]
You got past the grey text on gray background? -_-
rcleveng 11 hours ago [-]
I didn't. I open up Chrome's Developer Tools and drop this into the console:

document.body.style.backgroundColor = "black";

rsynnott 4 hours ago [-]
Yeah, I honestly don't know how anyone can put up with reading this sort of thing, much less have it read to them by a computer(!)

I suppose preferences differ, but really, does anyone _like_ this sort of writing style?

beigebrucewayne 8 hours ago [-]
I agree, it's atrocious!

1. I shouldn't have used a newly created repo that had no real work over the course of the last week.

2. I should have put more time into the prompt to make it sound less nails on chalkboard.

TZubiri 9 hours ago [-]
Remember the sycophant bug? Maybe making the user FEELGOOD is part of what makes it feel smart or like a good experience. Is the reward function being smart? Is it maximizing interaction? Does it conflict with being accurate?
blahgeek 14 hours ago [-]
Asking it to explain rust borrow checker is one of the worst examples to demonstrate its ability to read code. There are piles of that in its training data.
dundarious 12 hours ago [-]
Agreed, ask it to explain how exceptions are handled in python asyncio tasks, even given all the code, and it will vacillate like the worst intern in the world. What's more, there's no way to "teach" it, and even if there was, it would not last beyond the current context.

A complete waste of time for important but relatively simple tasks.

47 minutes ago [-]
gilbetron 1 hours ago [-]
"There are piles of that in its training data"

Such a weird complaint. If you were to explain the rust borrow checker to me, should I complain that it doesn't count because you had read explanations of the borrow checker? That it was "in your training data"? I mean, do you think you just understand the borrow checker without being taught about it in some form?

I mean, I get what you are kind of saying, that there isn't much evidence that they tools are able to generate new ideas, and that the sheer amount of knowledge it has obscures the detection of that phenomenon, but practically speaking I don't care because it is useful and helpful (within its hallucinatory framework).

rbren 14 hours ago [-]
I’m biased [0], but I think we should be scripting around LLM-agnostic open source agents. This technology is changing software development at its foundations—-we need to ensure we continue to control how we work.

[0] https://github.com/all-hands-ai/openhands

robotbikes 13 hours ago [-]
This looks like a good resource. There are some pretty powerful models that will run on a Nvidia 4090 w/ 24gb of RAM. Devstral and Queen 3. Ollama makes it simple to run them on your own hardware, but the cost of the GPU is a significant investment. But if you are paying $250 a month for a proprietary tool it would pay for itself pretty quickly.
NitpickLawyer 8 hours ago [-]
> There are some pretty powerful models that will run on a Nvidia 4090 w/ 24gb of RAM. Devstral and Queen 3.

I'd caution against using devstral on a 24 gb vram budget. Heavy quantisation (the only way to make it fit into 24gb) will affect it a lot. Lots of reports on locallama about subpar results, especially from kv cache quant.

We've had good experiences with running it fp8 and full cache, but going lower than that will impact the quality a lot.

seanmcdirmid 12 hours ago [-]
A Max M3 with 64 GB works well for a wider range of models although it fairs worse on stable diffusion jobs. Plus you can get it as a laptop.
handfuloflight 13 hours ago [-]
But what do we do if the closed models are just better?
bluefirebrand 13 hours ago [-]
Steal from them shamelessly, the same way they stole from everyone else?
dghlsakjg 5 minutes ago [-]
Isn't abusing the OpenAI terms of service part of how Deepseek did training?
hsuduebc2 13 hours ago [-]
You are onto something.
datameta 13 hours ago [-]
Seems ethically sound to me.
rkangel 5 hours ago [-]
The agents are separate from the models. Claude Code only allows you to use Claude, but Aider allows you to use any model.
handfuloflight 5 hours ago [-]
How does that solve the problem of closed models being better than open models?
hn8726 4 hours ago [-]
There is no problem. OP said we should be using open _agents_, not open _models_. You can use an open agent with any model, open or closed, while using something like Claude Code locks you in to one model vendor
davidmurdoch 13 hours ago [-]
Wait?
handfuloflight 13 hours ago [-]
And get superseded by competitors willing to spend on those models?
ProofHouse 14 hours ago [-]
This 10000%
jasonthorsness 16 hours ago [-]
The terminal really is sort of the perfect interface for an LLM; I wonder whether this approach will become favored over the custom IDE integrations.
ed_mercer 13 hours ago [-]
Exactly. It has access to literally everything including any MCP server. It's so awesome having claude code check my database using a read-only user, or have it open a puppeteer browser and check whether its CSS changes look weird or not. It's the perfect interface and anthropic nailed it.

It can even debug my k8s cluster using kubectl commands and check prometheus over the API, how awesome is this?

leptons 13 hours ago [-]
> or have it open a puppeteer browser and check whether its CSS changes look weird or not.

It's got 7 fingers? Looks fine to me! - AI

paulluuk 9 hours ago [-]
Me laughing as a human non-frontend dev having to do anything related to CSS

The number of times that my manager or coworkers have rejected proposals for technical solutions because I can't make a webpage look halfway decent is too damn high.

drcode 16 hours ago [-]
sort of, except I think the future of llms will be to to have the llm try 5 separate attempts to create a fix in parallel, since llm time is cheaper than human time... and once you introduce this aspect into the workflow, you'll want to spin up multiple containers, and the benefits of the terminal aren't as strong anymore.
sothatsit 11 hours ago [-]
I feel like the better approach would be to throw away PRs when they're bad, edit your prompt, and then let the agent try again using the new prompt. Throwing lots of wasted compute at a problem seems like a luxury take on coding agents, as these agents can be really expensive.

So the process becomes: Read PR -> Find fundamental issues -> Update prompt to guide agent better -> Re-run agent.

Then your job becomes proof-reading and editing specification documents for changes, reviewing the result of the agent trying to implement that spec, and then iterating on it until it is good enough. This comes from the belief that better, more expensive, agents will usually produce better code than 5 cheaper agents running in parallel with some LLM judge to choose between or combine their outputs.

sally_glance 14 hours ago [-]
Who or what will review the 5 PRs (including their updates to automated tests)? If it's just yet another agent, do we need 5 of these reviews for each PR too?

In the end, you either concede control over 'details' and just trust the output or you spend the effort and validate results manually. Not saying either is bad.

smallnamespace 14 hours ago [-]
If you can define your problem well then you can write tests up front. An ML person would call tests a "verifier". Verifiers let you pump compute into finding solutions.
bcrosby95 11 hours ago [-]
I'm not sure we write good tests for this because we assume some kind of logic involved here. If you set a human to task to write a procedure to send a 'forgot password' email, I can be reasonably sure there's a limited number of things a human would do with the provided email address, because it takes time and effort to do more than you should.

However with an LLM I'm not so sure. So how will you write a test to validate this is done but also guarantee it doesn't add the email to a blacklist? A whitelist? A list of admin emails? Or the tens of other things you can do with an email within your system?

djeastm 13 hours ago [-]
Will people be willing to make their full time job writing tests?
TeMPOraL 7 hours ago [-]
They probably won't. But it doesn't matter. Ultimately, we'll all end up doing manual labor, because that is the only thing we can do that the machines aren't already doing better than us, or about to be doing better than us. Such is the natural order of things.

By manual labor I specifically mean the kind where you have to mix precision with power, on the fly, in arbitrary terrain, where each task is effectively one-off. So not even making things - everything made at scale will be done in automated factories/workshops. Think constructing and maintaining those factories, in the "crawling down tight pipes with scewdriver in your teeth" sense.

And that's only mid-term; robotics may be lagging behind AI now, but it will eventually catch up.

ericrallen 11 hours ago [-]
We’ll just have an LLM write the tests.

Now we can work on our passion projects and everything will just be LLMs talking to LLMs.

therein 6 hours ago [-]
I hope sarcasm.
ehnto 11 hours ago [-]
As well, just because it pasts a test doesn't mean it doesn't do wonky, non-performant stuff. Or worse, side effects no one verified. Plenty often the LLM output will add new fields I didn't ask it to change as one example.
cwlb 14 hours ago [-]
https://github.com/dagger/container-use
jyounker 16 hours ago [-]
Having command line tools to spin up multiple containers and then to collect their results seems like it would be a pretty natural fit.
peab 14 hours ago [-]
dagger does this: https://www.youtube.com/watch?v=C2g3vdbffOI
mejutoco 8 hours ago [-]
Why would spinning containers remove the benefits? Presumably there is a terminal too interacting with the containers.
eru 8 hours ago [-]
Nah, if parallelism will help, it'll be abstracted away from the user.
jtms 15 hours ago [-]
Tmux?
mountainriver 14 hours ago [-]
What??? It’s literally the worst interface

Do you not want to edit your code after it’s generated?

bretpiatt 12 hours ago [-]
I'm running terminal in one window with AI interaction and then VS Code with project on same directories so I can see via color coding updated or new files to review in the IDE.

How do you interact with your projects?

never_inline 11 hours ago [-]
I run aider in VSCode terminal so that I can fix smaller lint errors myself without another AI back-and-forth.
aaronbrethorst 14 hours ago [-]
Sure, in VS Code. Or Xcode. Or IntelliJ/GoLand/RubyMine.
handfuloflight 13 hours ago [-]
...if your IDE doesn't have a terminal then it isn't an IDE.
bigiain 11 hours ago [-]
The "old wisdom" on comp/lang.perl.misc, when new people asked what was the best IDE to Perl programming, was "Unix".

You get both editors to choose from, vi _and_ emacs! All the man pages you could possibly want _and_ perldocs! Of _course_ as a Perl newbie you'll be able to fall back on gdb for complicated debugging where print statements no longer cut it.

leptons 12 hours ago [-]
I have a whole other screen for my terminal(s). The IDE already has enough going on in it.
handfuloflight 12 hours ago [-]
Then you are not impeded from editing your code because it was written through a terminal process, which seems to be OP's contention.
ldjkfkdsjnv 14 hours ago [-]
as the models get better, IDEs will be seen as low level
magackame 14 hours ago [-]
Wait you write your code by hand??? ewww...
fragmede 13 hours ago [-]
Aider's supported /voice for a while now.
42lux 13 hours ago [-]
voice is probably the worst human -> compute interface we have.
datameta 12 hours ago [-]
Human speech evolved with biological constraints and through neurological adaptions to emit and understand the nonlinear output that has lexically fuzzy areas to the untrained ear. So I think it's a rather "lossy" analog to digital conversion because the computer is simulating understanding of a form of information transfer that it itself is not constrained by (digital systems don't have vocal cords and could transmit anything).
eru 8 hours ago [-]
You could say that about any form of human communication at all.
jumski 9 hours ago [-]
Great article! I have similar observations and techniques and Claude Code is exceptionally good - most of the days I'm working on multiple things at once (thanks to git worktrees) and each going faster than ever - that's really crazy.

For the "sub agents"thing, I must admit, that Claude Code calling o3 via sigoden/aichat saved me countless of times!

There are just issues that o3 excells at (race conditions, bug hunting - anything that requires lot of context and really high reasoning abilities).

But I'm using it less since Opus 4 came out. And of course its none of the sub-agent thing at all.

I use this prompt @included in the main CLAUDE.md: https://github.com/pgflow-dev/pgflow/blob/main/.claude/advan...

sigoden/aichat: https://github.com/sigoden/aichat

myflash13 6 hours ago [-]
wait what? how do you work on multiple things at once with git worktrees?
pjm331 2 hours ago [-]
I never had any reason to use it before claude code et al. So I also wasn’t aware

Commands for working with copies of your entire repo in a new folder on a new branch

https://git-scm.com/docs/git-worktree

noiwillnot 2 hours ago [-]
This is amazing, I had no idea about this, I have been cloning my repo locally for years.
jumski 2 hours ago [-]
git worktree uses one repo to laid out multiple branches in separate directories.

git worktree add new/path/for/worktree branchname

I now refuse to use git checkout to switch branches, always keep my main branch checked out and updated and always use worktrees to work on features. Love this workflow!

Syzygies 10 hours ago [-]
No mention of Opus there or here (so far).

Having tried everything I settled on a $100/month Anthropic "Max" plan to use Claude Code. Then I learned how Claude Opus 4 is currently their best but most expensive model for my situation (math code and research). I limited out of a five hour session, switched to their API, and burned $20 in an hour. So I upgraded to $200/month "Max" and haven't hit limits yet.

Models matter. All these stories are like "I met a person who wasn't that smart." Duh!

beigebrucewayne 8 hours ago [-]
All of this was with Opus.
luckystarr 4 hours ago [-]
I recently investigated some problematic behaviour of both Opus 4 and Sonnet 4. When tasked to develop something more complicated (broker fed task management system, staggered execution scheduler) they would inevitably produce thousands of lines of over engineered, unmaintainable garbage. When Opus was then tasked to simplify it it boiled it down to 300 lines in one shot. The result was brilliant. This happened twice.

Moral of the story: I found out that I didn't constrain them enough. I now insist that they keep the core logic to a certain size (e.g. 300 lines) and not produce code objects for each concept but rather "fold them into the code".

This improved the output tremendously.

bionhoward 16 hours ago [-]
Assuming attention to detail is one of the best signs people give a fuck about craftsmanship, isn’t the fact the Anthropic legal terms are logically impossible to satisfy a bad sign for their ability to be trusted as careful stewards of ASI?

Not exactly “three laws safe” if we can’t use the thing for work without violating their competitive use prohibition

alwa 14 hours ago [-]
I can’t speak for their legal department, but their product, Claude Code, bears signs of lavish attention to detail. Right down to running Haiku on the context to come up with cute appropriate verbs for the “working…” indicators.
SamPatt 17 hours ago [-]
>Claude code feels more powerful than cursor, but why? One of the reasons seems it's ability to be scripted. At the end of the day, cursor is an editor, while claude code is a swiss army knife (on steroids).

Agreed, and I find that I use Claude Code on more than traditional code bases. I run it in my Obsidian vault for all kinds of things. I run it to build local custom keyboard bindings with scripts that publish screenshots to my CDN and give me a markdown link, or to build a program that talks to Ollama to summarize my terminal commands for the last day.

I remember the old days of needing to figure out if the formatting changes I wanted to make to a file were sufficient to build a script or just do them manually - now I just run Claude in the directory and have it done for me. It's useful for so many things.

Aeolun 16 hours ago [-]
The thing is, Claude Code only works if you have the plan. It’s impossible to use it on the API, and it makes me wonder if $100/month is truly enough. I use it all day every day now, and I must be consuming a whole lot more than my $100 is worth.
CGamesPlay 15 hours ago [-]
You use it "all day every day", so it makes sense that you would prefer the plan. It's perfectly economical to use it without a plan, if your usage patterns are different. Here's a tool someone else wrote that can help you decide: https://github.com/ryoppippi/ccusage
Aeolun 5 hours ago [-]
Sure, but if your usage pattern is such that you can’t justify the plan, then Cursor is a better option :)
davidw 14 hours ago [-]
One thing that I am not liking about the LLM world is that it seems to be tilting several things back in favor of BigCorps.

The open source world is one where antirez, working on his own off in Sicily, could create a project like Redis and then watch it snowball as people all over got involved.

Needing a subscription to something only a large company can provide makes me unhappy.

We'll see if "can be run locally" models for more specific tasks like coding will become a thing, I guess.

SamPatt 11 hours ago [-]
I share this concern - given the trajectory of improvements I do hope that we'll have something close to this level that can run locally within 18 months or so. And of course the closed source stuff will likely be better by then, but I genuinely believe I would choose an open source version of this right now if I had the choice.

The open source alternatives I've used aren't there yet on my 4090. Fingers crossed we'll get there.

TSiege 13 hours ago [-]
This is some nightmare fuel vendor lock-in where the codebase isn't understood by anyone and companies have to fork over more and more otherwise their business couldn't grow, adapt, etc
datameta 12 hours ago [-]
Yikes, you've just perfectly articulated a trajectory that I've been using subconsciously as one of the primary reasons why I want to keep my coding craft sharp.
3rdDeviation 12 hours ago [-]
Visionary, well done. Then comes the claim that AGI can unwind the spaghetti, and then the reality check.

I, for one, welcome our new LLM overlords.

sorcerer-mar 16 hours ago [-]
> It’s impossible to use it on the API

What does this mean?

oxidant 15 hours ago [-]
Not OP but probably just cost.
SV_BubbleTime 14 hours ago [-]
This.

You can EASILY burn $20 a day doing little, and surely could top $50 a day.

It works fine, but the $100 I put in to test it out did not last very long even on Sonnet.

ggsp 16 hours ago [-]
You can definitely use Claude Code via the API
lawrencechen 16 hours ago [-]
I think he means it's not economically sound to use it via API
wahnfrieden 13 hours ago [-]
A well-known iOS dev used Claude Code to build an iOS app and wrote a custom checking tool for how many tokens it consumed on the plan to compare with API pricing.

He uses two max plans ($200/mo + $200/mo) and his API estimate was north of $10,000/mo

practal 16 hours ago [-]
I think it is available on Claude Pro now, so just $20.
razemio 14 hours ago [-]
It is but very limited. I use API only, since this is the only plan, without usage limits and on demand pricing:

5x Pro usage ($100/month)

20x Pro usage ($200/month)

Source: https://support.anthropic.com/en/articles/11145838-using-cla...

"Pro ($20/month): Average users can send approximately 45 messages with Claude every 5 hours, OR send approximately 10-40 prompts with Claude Code every 5 hours."

"You will have the flexibility to switch to pay-as-you-go usage with an Anthropic Console account for intensive coding sprints."

cpard 9 hours ago [-]
How do you script Claude code? I've been using it as a CLI but haven't thought of invoking Claude code through a script, sounds very interesting.
jjice 16 hours ago [-]
I'm very interested to hear what your uses cases are when using it in your Obsidian Vault
SamPatt 12 hours ago [-]
Formatting changes across lots of notes, creating custom plugins, diagnosing problems with community plugins, creating a syncing program that compares my vault (with publish:true frontmatter) to my blog repo and if see changes then automatically updates the repo (which is used to build my site), creating a tool that converts inline urls to markdown footnotes, etc.

Obsidian is my source of truth and Claude is really good at managing text, formatting, markdown, JS, etc. I never let it make changes automatically, I don't trust it that much yet, but it has undoubtedly saved me hours of manual fiddling with plugins and formatting alone.

AstroBen 11 hours ago [-]
I had an LLM sort a crap-tonne of my notes into category folders the other day. My god that was helpful
AstroBen 12 hours ago [-]
Side note but the contrast between background and text here makes this really hard to read
thunkle 11 hours ago [-]
For me it's the blinking cursor at the top... It's hard to focus on the text.
jsjohnst 12 hours ago [-]
You aren’t missing much if you just skip it
abhisheksp1993 16 hours ago [-]
``` claude --dangerously-skip-permissions # science mode ```

This made me chuckle

tinyhouse 16 hours ago [-]
This article is a bit all over the place. First, a slide deck to describe a codebase is not that useful. There's a reason why no one ever uses a slide deck for anything besides supporting an oral presentation.

Most of these things in the post aren't new capabilities. The automation of workflows is indeed valuable and cool. Not sure what AGI has anything to do with it.

sandos 2 hours ago [-]
The number one thing I have found LLMs useful for is producing mermaidjs diagrams of code. Now, I know they are not always perfect but it has been "good enough" very many times, and I have never seen hallucinations here, only omissions. If I notice something missing its super-easy to tell it to amend.
bravesoul2 16 hours ago [-]
Also I don't trust it. They touched on that I think (I only skimmed).

Plus you shouldn't need an LLM to understand a codebase. Just make it more understandable! Of course capital likes shortcuts and hacks to get the next feature out in Q3.

imiric 16 hours ago [-]
> Plus you shouldn't need an LLM to understand a codebase. Just make it more understandable!

The kind of person who prefers this setup wants to read (and write) the least amount of code on their own. So their ideal workflow is one where they get to make programs through natural language. Making codebases understandable for this group is mostly a waste of effort.

It's a wild twist of fate that programming languages were intended to make programming friendly to humans, and now humans don't want to read them at all. Code is becoming just an intermediary artifact useless to machines, which can instead write machine code directly.

I wish someone could put this genie back in the bottle.

DougMerritt 15 hours ago [-]
> It's a wild twist of fate that programming languages were intended to make programming friendly to humans, and now humans don't want to read them at all.

Those are two different groups of humans, as you implied yourself.

lelandbatey 16 hours ago [-]
There is no amount of static material that will perfectly conform to the shape and contours of every mind that consumes that static material such that they can learn what they want to learn when they want to learn it.

Having a thing that is interactive and which can answer questions is a very useful thing. A slide deck that sits around for the next person is probably not that great, I agree. But if you desperately want a slide deck, then an agent like Claude which can create it on demand is pretty good. If you want summaries of changes over time, or to know "what's the overall approach at a jargon-filled but still overview level explanation of how feature/behavior X is implemented?", an agent can generate a mediocre (but probably serviceable) answer to any of those by reading the repo. That's an amazing swiss-army knife to have in your pocket.

I really used to be a hater, and I really did not trust it, but just using the thing has left me unable to deny its utility.

bravesoul2 14 hours ago [-]
The problem is if no one can describe something with words without an LLM to scour though every line of code it probably means it can't make sense to humans.

Maybe that is the idea (vibe coding ftw!) but if you want something people can understand and refine it is good to make it modular and decomposable and understandable. Then use AI to help you with the words for sure but at some level there is a human that understands the structure.

groby_b 15 hours ago [-]
> Plus you shouldn't need an LLM to understand a codebase. Just make it more understandable!

<laughs in legacy code>

And fundamentally, that isn't a function of "capital". All code bases are shaped by the implicit assumptions of their writers. If there's a fundamental mismatch or gap between reader and writer assumptions, it won't be readable.

LLMs are a way to make (some of) these implict assumptions more legible. They're not a panacea, but the idea of "just make it more understandable" is not viable. It's on par with "you don't need debuggers, just don't write bugs"

Uehreka 16 hours ago [-]
> Not sure what AGI has anything to do with it.

Judging from the tone of the article, they’re using the term AGI in a jokey way and not taking themselves too seriously, which is refreshing.

I mean like, it wouldn’t be refreshing if the article didn’t also have useful information, but I do actually think a slide deck could be a useful way to understand a codebase. It’s exactly the kind of nice-to-have that I’d never want a junior wasting time on, but if it costs like $5 and gets me something minorly useful, that’s pretty cool.

Part of the mind-expanding transition to using LLMs involves recognizing that there are some things we used to dislike because of how much effort they took relative to their worth. But if you don’t need to do the thing yourself or burn through a team member’s time/sanity doing it, it can make you start to go “yeah fuck it, trawl the codebase and try to write a markdown document describing all of the features and requirements in a tabular format. Maybe it’ll go better than I expect, and if it doesn’t then on to something else.”

dirtbag__dad 14 hours ago [-]
This article is inspiring. I haven’t had the moment to get my head out of the Cursor + biz logic water until now. Very cool to think about LLMs automagically creating changelogs, testing packaging when dependencies are bumped, forcing unit tests on features.

Is anyone aware of something like this? Maybe in the GitHub actions or pre-commit world?

pjm331 14 hours ago [-]
https://docs.anthropic.com/en/docs/claude-code/github-action...
citizenpaul 10 hours ago [-]
>automagically creating changelogs, testing packaging when dependencies are bumped, forcing unit tests on features.

Yeah now companies that paid lip service to those things can still not have them but pretend they do cause the AI did it....

b0a04gl 11 hours ago [-]
summaries like this are less about helping the dev and more about shaping commit history. when you let a model generate descriptions, tests, and boilerplate, you're also letting it define what counts as acceptable change. over time that shifts the team's review habits. if the model consistently downplays risky edits or adds vague tests, the bar drops silently. would be more useful to trace how model-written code affects long-term bug rate and revert patterns
tom_m 12 hours ago [-]
Well, there will always be a job for programmers folks.
citizenpaul 10 hours ago [-]
>openai codex (soon to be rewritten in rust)

Lol, I guess their AI is too good for a redactor. Better have humans do it.

rikschennink 9 hours ago [-]
I tried to read this on mobile but the blinking cursor makes it impossible.
beigebrucewayne 8 hours ago [-]
Removed it! I agree it was distracting.
hoppp 8 hours ago [-]
First time I heard about marp, very handy tool
dweinus 11 hours ago [-]
> Is it Shakespeare? No.

It's at least decent though, right?

> "What emerged over these seven days was more than just code..."

Yeesh, ok, but is it accurate?

> Over time this will likely degrade the performance and truthfulness

Sure, but it's cheap right?

> $250 a month.

Well at least it's not horrible for the environment and built on top of massive copyright violations, right?

Right?

distortionfield 11 hours ago [-]
Unrelated; but I am absolutely in love with this blog theme and color scheme.
fullstackchris 12 hours ago [-]
Gonna be a bit blunt here and ask why hooking up an agentic CLI tool to one or more other software tool(s) is the top post on HN right now... sure, some of these ideas are interesting but at the end of the day literally all of them have been explored / revisited by various MCP tools (or can be done more or less in scripted / hacked ways as the author shows here)

I don't know, just feels like a weird community response to something that is the equivalent to me of bash piping...

14 hours ago [-]
42lux 16 hours ago [-]
If people would be as patient and inventive to teach junior devs as they are with llms the whole industry would be better of.
sorcerer-mar 16 hours ago [-]
You pay junior devs way way way more money for the privilege of them being bad.

And since they're human, the juniors themselves do not have the patience of an LLM.

I really would not want to be a junior dev right now... Very unfair and undesirable situation they've landed in.

mentos 16 hours ago [-]
At least it’s easier to teach yourself anything now with an LLM? So maybe it balances out.
sorcerer-mar 16 hours ago [-]
I think it's actually even worse: it's easier to trick yourself into thinking you're teaching yourself anything.

Learning comes from grinding and LLMs are the ultimate anti-intellectual-grind machines. Which is great for when you're not trying to learn a skill!

andy99 15 hours ago [-]
Even though I think most people know this deep down, I still don't think we actively realize how optimized LLMs are towards sounding good. It's the ultra processed food version of information consumption. People are super lazy (economical if you like) and rlhf et al have optimized LLM output to being easy to digest.

Consequence is you get a bunch of output that looks really good as long as you don't think about it (and they actively promotes not thinking about it) that you don't really understand, and that if you did dig into you'd realize is empty fluff or actively wrong.

It's worse than not learning, it's actively generating unthinking but palatable garbage that's the opposite of learning.

jyounker 16 hours ago [-]
Yeah, you have to be really careful about how you use LLMs. I've been finding it very useful to use them as teachers, or to use them in the same way that I'd use a coworker. "What's the idiomatic ways to write this python comprehension in javascript?" Or, "Hey, do you remember what you call it when..." And when I request these things I'll try to ask in the most generic way possible so that I then get retype the relevant code, filling in the blanks with my own values.

That's just one use though. The other is treating it like it's a jr developer, which has its own shift in thinking. Practice in writing details specs goes a long way here.

sorcerer-mar 16 hours ago [-]
100% agreed.

> Practice in writing details specs goes a long way here.

This is an additional asymmetric advantage to more senior engineers as they use these tools

tnel77 15 hours ago [-]
>>Learning comes from grinding

Says who? While “grinding” is one way to learn something, asking AI for a detailed explanation and actually consuming that knowledge with the intent to learn (rather than just copy and pasting) is another way.

Yes, you should be on guard since a lot of what it says can be false, but it’s still a great tool to help you learn something. It doesn’t completely replace technical blogs, books, and hard earned experience, but let’s not pretend that LLMs, when used appropriately, don’t provide an educational benefit.

sorcerer-mar 15 hours ago [-]
Pretty much all education research ever points to the act of actually applying knowledge, especially against variable cases, to be required to learn something.

There is no learning by consumption (unfortunately, given how we mostly attempt to "educate" our youth).

I didn't say they don't or can't provide an educational benefit.

fullstackchris 12 hours ago [-]
Some of the best software learning I ever had when I was starting out was following along with video courses and writing the code line by line along with the instructor... or does this not count as "consumption"?
sorcerer-mar 12 hours ago [-]
> I was... following along and writing the code line by line

That's application. Then presumably you started deviating a little bit from exactly what the instructor was doing. Then you deviated more and more.

If you had the instructor just writing the code for every new deviation you wanted to build and you just had to mash the "Accept Edit" button, you would not have learned very effectively.

15 hours ago [-]
djeastm 13 hours ago [-]
Sure, but easy in, easy out. Hard earned experience is worth soo much more than slick summaries of the last twenty years of blog articles.
fallinditch 15 hours ago [-]
Maybe it's the senior devs who should be the ones to worry?

Seniors' attitudes on HN are often quick to dismiss AI assisted coding as something that can't replace the hard-earned experience and skill they've built up during their careers. Well maybe, maybe not. Senior devs can get a bit myopic in their specializations. Whereas a junior Dev doesn't have so much baggage, maybe the fertile brains of youth are better in times of rapid disruption where extreme flexibility of thought is the killer skill.

Or maybe the whole senior/junior thing is a red herring and pure coding and tech skills are being deflated all across the board. Perhaps what is needed now is an entirely new skill set that we're only just starting to grasp.

sally_glance 15 hours ago [-]
Wherever you look, the conclusion is the same - balance is required. Too many seniors, you get stuck in one way streets. Too many juniors, you trip over your own feet and diverge into unknown avenues. Mix AI in, I don't see how that changes much at all... Juniors drive into unknown territory faster, Seniors get stuck in their niche just as well. Acceleration yes, fundamental change of how we work - I don't see it yet.
AdieuToLogic 13 hours ago [-]
> Seniors' attitudes on HN are often quick to dismiss AI assisted coding as something that can't replace the hard-earned experience and skill they've built up during their careers.

One definition of experience[0] is:

  direct observation of or participation in events as a basis of knowledge
Since I assume by "AI assisted coding" you are referring to LLM-based offerings, then yes, "hard-earned experience and skill" cannot be replaced with a statistical text generator.

One might as well assert an MS-Word document template can produce a novel Shakespearean play or that a spreadsheet is an IRS auditor.

> Or maybe the whole senior/junior thing is a red herring and pure coding and tech skills are being deflated all across the board. Perhaps what is needed now is an entirely new skill set that we're only just starting to grasp.

For a repudiation of this hypothesis, see this post[1] also currently on HN.

0 - https://www.merriam-webster.com/dictionary/experience

1 - https://blog.miguelgrinberg.com/post/why-generative-ai-codin...

yakz 15 hours ago [-]
Senior devs provide better instructions to the agent, and can recognize more kinds of mistakes and can recognize mistakes more quickly. The feedback loop is more useful to someone with more experience.

I had a feeling today that I should really be managing multiple instances at once, because they’re currently so slow that there’s some “downtime”.

sorcerer-mar 15 hours ago [-]
Maybe! Probably not though.
bakugo 15 hours ago [-]
> Maybe it's the senior devs who should be the ones to worry?

Why would they be worried?

Who else going to maintain the massive piles of badly designed vibe code being churned out at an increasingly alarming pace? The juniors prompting it certainly don't know what any of it does, and the AIs themselves have proven time and again to be incapable of performing basic maintenance on codebases above a very basic level of complexity.

As the ladder gets pulled up on new juniors, and the "fertile brains" of the few who do get a chance are wasted as they are actively encouraged to not learn anything and just let a computer algorithm do the thinking for them, ensuring they will never have a chance to become seniors themselves, who else will be left to fix the mess?

CuriouslyC 4 hours ago [-]
If your seniors aren't analyzing the PRs being vibe coded by others in the orgs to make sure they meet quality standards, that is the source of your problem, not the vibe coding.
tonyhart7 15 hours ago [-]
we literally have many no code solution like wordpress etc

do webdev is still there??? yes there are just because you can "create" something that doesn't mean you knowledge able in that area

we literally have entire industry created to fix wordpress instance + code, what do you else we need to worry for

jwr 14 hours ago [-]
> You pay junior devs way way way more money for the privilege of them being bad.

Oh, it's worse than that. You do that, and they complain that they are underpaid and should earn much, much more. They also think they are great, it's just you, the old-timer, that "doesn't get it". You invest lots of time to work with them, train them, and teach them how to work with your codebase.

And then they quit because the company next door offered them slightly more money and the job was easier, too.

leptons 12 hours ago [-]
>You pay junior devs way way way more money for the privilege of them being bad.

I hope you don't think that what you're paying for an LLM today is what it actually costs to run the LLM. You're paying a small fraction.

So much investment money is being pumped into AI that it's going to make the 2000 dot-com bubble burst look tiny in comparison, if LLMs don't start actually returning on the massive investments. People are waking up to the realities of what an LLM can and can't do, and it's turning out to not be the genie in the bottle that a lot of hype was suggesting. Same as crypto.

The tech world needs a hype machine and "AI" is the current darling. Movie streaming was once in the spotlight too. "AI" will get old pretty soon if it can't stop "hallucinating". Trust me I would know if a junior dev is hallucinating and if they actually are then I can choose another one that won't and will actually become a great software developer. I have no such hope for LLMs based on my experiences with them so far.

TeMPOraL 4 hours ago [-]
> I hope you don't think that what you're paying for an LLM today is what it actually costs to run the LLM. You're paying a small fraction.

Depends, right? Claude Code on a Max plan is obviously unsustainable if the API costs are any indication; people can burn through the subscription price in API credits in a day or less.

But otherwise? I don't feel like API pricing is that unrealistic. Compute is cheap, and LLMs aren't as energy-intensive in inference as some would have you believe (especially when they conveniently mix up training and inference). And LLMs beat juniors at API prices already.

E.g. a month ago, a few hours of playing with Gemini or Claude 3.5 / 3.7 Sonnet had me at maybe $5 for a completed little MVP of an embedded side project; it would've taken me days to do it myself, even more if I hired some random fresh grad as a junior, and $5 wouldn't fund even an hour of their work. API costs would had to be underpriced by at least two orders of magnitude for juniors to compete.

sorcerer-mar 12 hours ago [-]
Yeah, all fair, but I think there's enough capital to keep the gravy train rolling until the cost-per-performance actually get way, way, way below human junior engineers.

A lot of the application layer will disappear when it fails to show ROI, but the foundation models will continue to have obscene amounts of money dumped into them, and the coding use case will come along with that.

beefnugs 15 hours ago [-]
See if the promise was real: llms are great skill multipliers! Then it is the new renaissance of one developer businesses popping up left and right every day! Ain't nobody got time for corporate coercion hierarchy nonsense.

Hmm no news about that really

yieldcrv 15 hours ago [-]
> I really would not want to be a junior dev right now... Very unfair and undesirable situation they've landed in.

I don't really get this, at the beginning of my career I masquaraded as a senior dev with experience as fast as I could until it was laundered into actual experience

Form the LLC and that's your prior professional experience, working for it

I felt I needed to do that and that was way before generative AI, like at least a decade

drewlesueur 16 hours ago [-]
I think it would be great to be a junior dev now and be able to learn quickly with llms.
lelanthran 15 hours ago [-]
> I think it would be great to be a junior dev now and be able to learn quickly with llms.

I'm not so sure; I get great results (learning) with them because I can nitpick what they give me, attempt to explain how I understand it and I pretty much always preface my prompts with "be critical and show me where I am wrong".

I've seen a junior use it to "learn", which was basically "How do I do $FOO in $LANGUAGE".

For that junior to turn into a senior who prompts the way I do, they need a critical view of their questions, not just answers.

jml78 14 hours ago [-]
If you actually want to learn………

I have experienced multiple instances of junior devs using llm outputs without any understanding.

When I look at the PR, it is immediately obvious.

I use these tools everyday to help accelerate. But I know the limitations and can look at the output to throw certain junk away.

I feel junior devs are using it not to learn but to try to just complete shit faster. Which doesn’t actually happen because their prompts suck and their understanding of the results is bad.

qsort 16 hours ago [-]
The vilification of juniors and the abandonment of the idea that teaching and mentoring are worthwhile are single-handedly making me speedrun burnout. May a hundred years of Microsoft Visio befall anybody who thinks that way.
empireofdust 13 hours ago [-]
What’s the best implementation of junior training or teaching/mentoring in general within tech that you’ve seen?
qsort 9 hours ago [-]
Unless you're running a police state environment where every minute of company time is tracked, enough opportunities for it to happen organically exist that it's not a matter of how you organize it, it's a matter of culture. Give them as much responsibility as they can handle and they'll be the ones reaching out to you.
godelski 16 hours ago [-]
A constant reminder: you can't have wizards without having noobs.

Every wizard was once a noob. No one is born that way, they were forged. It's in everybody's interest to train them. If they leave, you still benefit from the other companies who trained them, making the cost equal. Though if they leave, there's probably better ways to make them stay that you haven't considered (e.g. have you considered not paying new juniors more than your current junior that has been with the company for a few years? They should be able to get a pay bump without leaving)

lunarboy 16 hours ago [-]
I'm sure people (esp engineers) know this. But imagine you're starting a company: would you try to deploy N agents (even if shitty), or take a financial/time/legal/social risk with a new hire. When you consider short-term costs, the math just never works out in favor of real humans.
geraneum 16 hours ago [-]
Well, in the beginning, the math doesn’t work out in favor of building the software (or the thing you want to sell) either.
QuercusMax 15 hours ago [-]
What about the financial / legal / social risk of your AI agent doing something bad? You're only looking at cost savings, without seeing the potentially major downsides.
shinycode 15 hours ago [-]
To follow up my previous comment, I worked on a project where someone fixed an old bug. This bug became a feature for clients who build their systems around this api endpoint. The consequence is hundreds of thousands of user duplicates with automations attaching new ressources and actions randomly on the duplicates. Massive consequences for the customers. If it were an AI doing the fixing with no human intervention, good luck understanding, cleaning the mess and holding accountable. People seem lightly think that if the agent is doing something bad it’s just a risk to take. But when a codebase with massive amounts of loc and logic is build and no human knows it, how to deal with the consequences on people’s business ? Can’t help but think it’s crappy software with a « Google closed your Gmail account, no one knows why and we can’t do anything about it, sorry ». But instead of a mail account it’s part of your business
tonyhart7 15 hours ago [-]
"What about the financial / legal / social risk of your AI agent doing something bad?"

the same way we treat it like human making mistake??? AI cant code themselves, someone command them to create something

shinycode 15 hours ago [-]
I can’t stop thinking that this way of thinking is either plain wrong and misses completely what software development is really about. Or very true and in X years people will just ask the trending AI « I need a billing/CRM/X system with those constraints ». Then the AI will ask questions and refine the need. Work for 30mn the time to use libs and code the whole thing, pass into systems to test and deploy and voila. Custom feature on demand. No CEO, no sales, nobody. You just deploy your own SaaS feature. Then good luck to scale properly and migrate data and add features and complexity. If agents hold onto their promise, then the future is custom based, you deploy what you need, SaaS platform is dead with everyone in between useless.
QuantumGood 16 hours ago [-]
I think too many see it more as "every stem cell has the potential to be any [something]", but it's generally better to let them self differentiate until survivors with more potential exist.
TuringNYC 15 hours ago [-]
>> A constant reminder: you can't have wizards without having noobs.

Try telling that to companies with quarterly earnings. Very few resist the urge to optimize for the short term.

jayofdoom 15 hours ago [-]
I spent a lot of time in my career, honestly some of the most impactful stuff I've done, mentoring college students and junior developers. I think you are dead on about the skills being very similar. Being verbose, not making assumptions about existing context, and generalized warnings against pitfalls when doing the sort of thing you're asking it to do goes a long long way.

Just make sure you talk to Claude in addition to the humans and not instead of.

15 hours ago [-]
handfuloflight 16 hours ago [-]
[flagged]
noman-land 15 hours ago [-]
It sounds like this person doesn't deserve to be under your wing. Time to let him fly for himself, or crash.
QuercusMax 16 hours ago [-]
Damn, that sucks. My experience has been the exact opposite; maybe you need to adjust your approach and set expectations up-front, or get management involved? (I've had a similar experience to you with my teenage kids, but that's a whole other situation.)

My M.S. advisor gave me this advice on when I should ask for help, which I've passed on to lots of junior engineers: It's good to spend type struggling to understand something, and depending on the project it's probably good to exert yourself on your own somewhere between an hour and a day. If you give up after 5 minutes, you won't learn, but if you spend a week with no progress, that's also not good.

dwohnitmok 17 hours ago [-]
On the one hand very cool.

On the other hand, every time people are just spinning off sub-agents I am reminded of this: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...

It's simultaneously the obvious next step and portends a potentially very dangerous future.

TeMPOraL 17 hours ago [-]
> It's simultaneously the obvious next step

As it has been over three years ago, when that was originally published.

I'm continuously surprised both by how fast the models themselves evolve, and how slow their use patterns are. We're still barely playing with the patterns that were obvious and thoroughly discussed back before GPT-4 was a thing.

Right now, the whole industry is obsessed with "agents", aka. giving LLMs function calls and limited control over the loop they're running under. How many years before the industry will get to the point of giving LLMs proper control over the top-level loop and managing the context, plus an ability to "shell out" to "subagents" as a matter of course?

qsort 16 hours ago [-]
> How many years before the industry will get to the point

When/if the underlying model gets good enough to support that pattern. As an extreme example, you aren't ever going to make even a basic agent with GPT-3 as the base model, the juice isn't worth the squeeze.

Models have gotten way better and I'm now convinced (new data -> new opinion) that they are a major win for coding, but they still need a lot, a lot of handholding, left to their own devices they just make a mess.

The underlying capabilities of the model are the entire ballgame, the "use patterns" aren't exactly rocket science.

benlivengood 16 hours ago [-]
We haven't hit the RSI threshold yet and so evolution is so slow that it's usually terminated as not-useful or it solves a concrete problem and is terminated by itself or a human. Earlier model+frameworks merely petered out almost immediately. I'm guessing it's roughly correlated with the progress on METR.
lubujackson 15 hours ago [-]
Am I the only one who saw in the prompt:

> ${SUGESTION}

And recognized it wouldn't do anything because of a typo? Alas, my kind is not long for this world...

floren 13 hours ago [-]
I noticed it and then scrolled through looking for the place where they called it out... sadly disappointed but I don't know what I expected from lesswrong
15 hours ago [-]
intralogic 16 hours ago [-]
[flagged]
CGamesPlay 15 hours ago [-]
In general, "reader mode". I don't use Chrome but Google suggests that it's in a menu <https://support.google.com/chrome/answer/14218344?hl=en>. Many Chrome-alikes provide it built-in (Brave calls it Speedreader), and many extensions can add it for you (Readability was the OG one).
15 hours ago [-]
konexis007 14 hours ago [-]
.
jilles 14 hours ago [-]
How does this compare with Apples or Orange?
brcmthrowaway 14 hours ago [-]
How does this compare with Code::Blocks?
johnwheeler 13 hours ago [-]
I've actually stumbled upon a novel new way of using Claude code that I don't think anybody else is doing that's insanely better. I'll release it soon.
aussieguy1234 13 hours ago [-]
I played around with agents yesterday, now I'm hooked.

I got Claude Code (With CLine and VSCode) to do a task for a personal project. It did it about 5x faster than i'd have been able to do manually including running bash commands e.g. to install dependencies for new npm packages.

These things can do real work. If you have things in plain text format like markdown, csv spreadsheets etc, alot of what normal human employees do today could be somewhat automated.

You currently still need a human to supervise the agent and what its doing, but that won't be needed anymore in the not so distant future.

jvanderbot 11 hours ago [-]
I can't wait until Section 174 changes are repealed and nobody is financially invested in software from AI anymore.
tra3 9 hours ago [-]
Thank you, finally a realistic take.
jvanderbot 3 hours ago [-]
It seems I'm in the vast minority. Post hoc ergo proper hoc indeed.
eru 8 hours ago [-]
America = world?
jvanderbot 3 hours ago [-]
Is this meant to say "I don't care because I'm not in USA"? Or "it's not a problem because it's only USA?" Or "don't speak of US-specific situations on this forum because it contains people of many nationalities?"

It's entirely possible for a world changing tech to be created and steered to match a unique problem inside one country, and for that to change job markets everywhere.