Vibe Coding
As a software developer, the developments in AI, and LLM's have had a large impact on the perception of my field.
On the one hand, the maximalist argument promises us that in the implausibly near future, LLM's will outperform humans at writing code, therefore there will be no need for human software developers, and my profession will become extinct.
It must be said that the majority of the proponents of this belief have a vested interest in it being true - a lot of them are making the tools that would bring about this change, and thus they have to believe it to convince their investors that a revolution is coming.
The other-handed argument is a lot of software developers burying their head in the sand and claiming that nothing will change, in a voice, quavering with fear that they will soon have no career prospects.
Even though the maximalist argument is fairly extreme and clearly biased, it's already having an impact on a jobs market that is already reeling from the economy. The market can stay irrational longer than we can remain solvent, thus we need to take it seriously. Even if it is unlikely, an existential threat must be analysed.
I do not, therefore, believe that we can dismiss LLM maximalism as overly optimistic hyperbole, no matter how implausible it may feel. Thus, I have spent a fair amount of time trying to utilize the new AI tools to test the veracity of the claims.
Initially I found AI coding tools very underwhelming - the first UIs used LLM's as a 'better' autocomplete - a feature that I've never really benefited from. My bottleneck while coding isn't typing, it's the speed of my own thought. Breaking my flow to inject code-review of a snippet has always slowed me down and stressed me out, no matter how good the content of the autocompletion. Indeed, being slowed to typing speed helps me build an accurate mental model of a codebase.
This is of course personal preference - there are certainly a lot of developers who prefer a heavy IDE that prompts their code out line by line. I don't worry about my productivity vs this style of development, because I have a long experience of out-producing it due to the insights a robust mental model brings. On the other hand, the skepticism of autocompletion led me to postpone using LLM's inline for quite some time.
The next iteration in coding LLM support was the ability to converse with an LLM in an editor pane, and then apply it's suggestions to the files. I tried this approach with Cursor and Zed, but it felt clunky, and there was additional complexity imposed by the necessity of manually curating a context window for the discussion.
Again, for me, the disadvantages didn't seem to justify the small incremental benefit of generating code this way, and I was often left underwhelmed by the suggestions themselves. Too much time was spent fixing crappy LLM-generated code -- it made the development cycle feel like wrestling with a legacy codebase.
Thus my skepticism persisted.
Then I tried Claude Code, the latest tool from Anthropic. This tool runs in a repository, and allows automatic changes to files from a conversation. No editor is required, you simply review the diffs that the tool makes. Also the tool can run the code, and introspect and iterate based on the response. The experience goes from being a pair-programmer with an uncooperative robot, to being the micro-manager of a precocious junior developer.
The $5 of credit that I initially loaded as an experiment quickly evaporated, then so did another $25. I had, in essence, started employing this junior developer as a coworker. The UI had improved to the point where my main hurdle wasn't the experience of using the tool, but rather the quality of the LLM. And the quality was good.
I tested the tool on 3 projects of varying complexity to see how it would perform.
The easiest project was a static website for my consulting business - I wanted to create some landing pages, and I didn't really care about code quality. This is pretty much an ideal scenaro for 'vibe coding' - the term de jour for simply delegating all control the the LLM and blindly accepting it's suggestions.
As expected, the tool worked great - I had a small conversation about what I was after, and let it write the copy, design the site, and tweak my deploy Makefile.
It struggled with some of the more esoteric S3/Cloudfront quirks (thus I still have a .html suffix on the URL's which hasn't bothered me enough to fix it manually). The experience was so good that I doubt I will implement static HTML any more.
The second project was a web service I am developing to auto-generate an API atop a dataset. I had hacked together the beginnings of a rust server, but at the point I started using Claude, didn't have a working implementation.
For tasks such as adding a database table, or gluing together libraries, the suggested edits seemed good, but the LLM struggled to get the Rust code to compile and often go stuck in debugging loops that ate my API credit without making progress. I had to be fairly diligent in watching it, to avoid these expensive rabbit holes.
My progress had stagnated and the server still didn't work, Rather than going back to fixing it manually, I thought about it like the manager of a junior employee - it seemed my language choice was just too tricky for Claude.
So I had it rewrite the server in python, and quickly progress returned. Without the string types and guarantees of Rust, I had less confidence in the server, and it was harder to know if the solution was correct, but the tool was able to manually test the code until it fulfilled my requirements.
Nonetheless, in a completely subjective guess, it didn't feel like the tool was speeding me up that much. I felt like the fight to fulfill my requirements in a bug free way took away a lot of my motivation. I have a suspicion that there are ways to use the tool in a more productive way, but for this project, I haven't yet found them.
The third project was the most complex, and I fully expected Claude to struggle, I tried it purely as an experiment to find the edges of it's capabilities.
I have a ray-tracer implemented in rust, and in it, I have implemented a sky-sphere that uses Rayleigh and Mie scattering to simulate the sky. None of the code is particularly novel - indeed it's mostly just a port of shader code into rust, but some of the calculations are a little tricky, and I had a suspicion that I had some bugs.
I had been meaning to rewrite it, and asked Claude to have a go at refactoring and documenting the code.
And honestly the first attempt was rather good - it extracted some functions, added some decent documentation, and some unit tests.
But in doing this, it added a subtle bug, that I only noticed by inspecting the output images - the sunsets looked far less red than they should. I asked Claude why this was, and it gave me a very decent explanation of the issue - my initial implementation had accumulated the optical depth as it sampled the rays, whereas the refactored version did not.
I asked it to fix the issue and document what went wrong. Up until this point I was very impressed, but something very interesting happened - it documented the differences in approach as though they were both valid approaches, and my implementation was just one of two good options.
This forced me to go away and do some reading - and after poring over explanations of the technique, I could only conclude that the suggestion that the non-accumulated implementation was correct was a hallucination - it was simply wrong.
I asked the tool about this, and it agreed with me - in fact the second version would violate the Beer-Lambert law - a physics law that I was previously unaware of. So I asked it to fix the documentation, to which it readily agreed - in it, it noted however that it's implementation was 'a common bug'. This irked me - I had no way of knowing whether this bug was commmon or whether I was being gaslit by the LLM, and also the defensiveness in the comment felt wrong.
In general I had been very impressed at the tool's ability to debug complicated code, but pretty scared about the fact that it could hallucinate justifications for errors, and then get defensive about it.
Therefore I had three anecdotes about coding with LLM's - for simple static webpages, it is a no-brainer, and the results are impressive. For more complex tasks, language choice has a big impact, the tool can get itself stuck in bad debug cycles, and it doesn't feel groundbreaking, but still has potential. And for complex debugging, it is genuinely impressive, but demonstrates that you cannot trust the output to not be hiding really nasty surprises, and getting defensive when asked about it.
The million dollar question therefore remains - will LLM's continue to advance in capabilities until they outperform any human Software Engineer? Or will their abilities plateau. It's impossible to predict. In either case, being able to work with these tools, to understand their output, and to evaluate how to improve their use, remains an valuable and tangible skill.
If the former is true, then learning to code is like learning to play chess - it has intrinsic benefits, even if machines can do it better.
Even if only the latter is true (which remains my suspicion), the tools are genuinely useful in their current incarnation. But you need a good engineer to manage them.
So we software developers aren't extinct, or even endangered yet. But perhaps we are at-risk...