Skip to main content
Martin Hähnel

Those Recent Puzzmo Articles about Claude Code

Zack Gage is a person that has made really cool puzzle games that I like a lot. Some favorites include Flipflop Solitaire and Good Sudoku. Anyway, he and some other people have started a puzzle game website called Puzzmo. Puzzmo has a blog. And in it, they describe all kinds of fascinating technical stuff, like game and UI design questions as well as programming topics. One of them happens to be "coding with AI".

Somewhat recently Orta Therox, their tech lead (I think), has published two articles about using Claude Code to work on Puzzmo.

I want to comment on what I found interesting about those two articles, because I think they are fascinating artifacts of this point in time. But before we start a short™ statement on my perspective:

My Perspective On LLMs

To get back to the question whether LLMs are a positive for society, the answer depends entirely on what we are using LLMs for. Right now, it seems to me that LLMs are mostly helpful in a programming context (and maybe a handful of other use cases, although I have no first-hand experience of that) and they are regrettably useful - or at least used - to try to scam people, make workers lose their jobs, threaten the environment and drown out the beauty and uniqueness of everyone's voice.

Depending on our use and what we expect others to use it for, it can be at least a neutral thing. Or a somewhat-negative-but-because-of-the-upside-accepted thing. Like a small, reasonable car.

This twitter post made the rounds recently:

"You know what the biggest problem with pushing all-things-AI is? Wrong direction. I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes." — x.com/AuthorJMa...

And I think it encapsulates beautifully my sentiment: Use AI for the right things and there isn't really a big problem with the technology.

It's important to note that my article tried to figure out a framework that - all else being equal - has a sanity based approach to judging the use of a technology/product that exists, right now. Given that me, the individual, can't really change how the current crop of "AI" was made, I can at least find a way to interact with them that makes sense and isn't "purity based".

I think this "anti-purity framework of judgment approach" is still a good idea. Normal people - including you - will use LLMs, sometimes you'll overstep and use them for frivolous things. Within reason, that's fine.

What the first of the two Puzzmo articles made clear to me (again) however, by mentioning the Overton window, is that things are, of course, in flux. Society as a whole will shift, shrink and expand what is acceptable behavior. It will do that in response to all kinds of factors, with technology being an important one among them.

And the technology/product "AI" - flawed[1] as it may be - will of course get better. Maybe this "better" here doesn't denote a certain march towards AGI - it certainly doesn't look like it, but its UIs, reliability, user experience and other things definitely have gotten better over time and will continue to get better. All of this means that LLMs can be used in contexts they weren't applicable before and LLMs can be used more productively in contexts where their usage made sense already. More usage equals more influence on society (and certain industries, like programming), naturally.

Back to my old article, though:

For the societal viewpoint, on the other hand, it's important to keep in mind that society is not a person, and it doesn't make decisions. It doesn't have intentions, and you can't actually interact with it directly. So making demands on society doesn't make a lot of sense. It may make some sense to demand change from politicians, but politicians are not society.

This text is not about changing society through political action, though. It is about exploring what could be a way to live within our current situation that neither loses sight of the complexities of life, by proclaiming a set of maxims, nor throws out the baby with the bath water, by being a cynical, egotistical jerk.[…]

As far as I'm concerned, "AI" - flawed, yes, of course, I'll stop saying it now - are sticking around and are going to make a huge difference going forward. A "huge difference" here is meant as a matter of fact. I don't ascribe positive or negative values directly to huge differences and neither should you. Neither am I liking or disliking the fact that they are here to stay. I do not have crystal ball either to tell what "sticking around" means. Does it mean forever? Does it mean 20 years? 5 years? I'm pretty sure they are still here in a couple of years though. Their importance might rise and then sharply drop (e.g. because no body can pay for this stuff) or it might plateau or it might continue to eat the world. I don't know that. And none of these options are in themselves a value judgement. But. We will have to content with them and their influence somewhere in between maybe 5 years and forever.

In an unpublished article about LLMs[2] I wrote:

[H]ere's my main point: If LLMs are going to stick around then people capable of critical thought and engaging in empathy and seeing the bigger picture ought to engage with LLMs in all dimensions of the concept/tech.

It is similar to the car industry, to the fossil fuel industry and many others. Leaving these jobs to unthinking, uncaring people is indeed worse than to engage with those industries. I don't know how far this argument holds exactly, but my assumption is that apart from actual societal tabus most things would benefit from people that have strong progressive values who care. Don't you think?

Will this radically change anything? No. Will performatively writing purity based arguments against LLMs do anything, though? No. But there is an important difference. Being open to the idea that LLMs could be changed ever so slightly to something better could do at least something.

My unpublished article I quote here was supposed to be called "We Need People With Hearts Working on AI". And that means on all sides of the matter. Inside companies using "AI" (like mine and probably yours) and by workers inside companies like Anthropic, too. This is how I envision we actually change a technology/product and its influence on society for the better: From the "inside" and gradually.

Anyway.

What interested me

I am not going to close read all of this!

But it's still a lot. Just FYI.

Let's read the first article:

Over the last three months, I have been trying to really push myself out of that comfort zone and to start exploring the tools that you see the ‘vibe coding’ folks are using and get a sense for what these systems are like when you are someone who takes this stuff very seriously, and considers programming + systemic design to be the craft I plan to do for the rest of my life.

This speaks to me. As I also take programming, code quality and architecture seriously. I can't claim that I am at all on the same level as this person - I for one have not worked on the typescript compiler or on something as interactive and pretty as Puzzmo -, but when it comes to what I value in my work as a programmer, it's pretty similar to this sentiment.

Before we start

That said, allow me to self flagellate before we dig in. I don’t think LLMs are a good thing for the world. I think they concentrate power to those with capital, I think they will “increase throughput” for folks in ways that will give fewer people jobs and will force higher inter-class competition in culturally unhealthy ways like a concentrated version of the gig economy. If you want to see the epitome of “worst person you know” meme, Tucker Carlson’s point about what the social cost of driverless trucking looks like (0m-3m, perhaps open in a private browser to avoid poisoning your algorithms) is something which is always in the back of my mind. I believe in the dead internet theory and think that using LLMs to make slop which looks like it was authored by a human is unethical. I think LLMs are assisting to push the Overton window towards authoritarianism by making it easier to spread disinformation, to create false narratives and to reduce agency in favour of algorithms. Some of this stuff could be OK if we had wide-scale social safety nets and a healthy labour class, but I don’t think something like UBI is coming, and I think LLMs are going to continue to push economic growth over planet-wide health. This technology will help contribute to a worse world for most people. […] Yet still, LLMs are here. They’ve already come for artists, and voice actors. Now they are here for programmers. I can’t stop that. I dig that for some people absolute embargo is the only option, but I want to understand what they are and they are going to affect culture. […] I’m going to lose some internet points and try talk through my usage and experiences.

I think this whole part is very interesting. Orta Therox thinks that LLMs are not a good thing for this world. He is pretty hopeless as regards to the future of society. And he thinks there is nothing you can really do about it. The way this is written makes me think that this person is not without heart, just fatalist (as I tend to be also). There are worse fates. I also understand the wish to understand them better by using them. I don not at all subscribe to the notion that you're somehow "turned" by using LLMs as some critiques suggest. So I get it. This thing will have massive - mostly negative - consequences, I can acknowledge but can't change that and as a technologist I still want to know what these things can do in practice. Kudos for writing about it in public.

Some smaller parts of this I wanted to disect:

allow me to self flagellate

This devalues necessary and real criticism of a powerful technology that changes society. Mostly - or so the author thinks - for the worse. Calling it "self flagellation" misses the point of expressing the criticism in the first place. It's not sadomasochism or penance. You're not absolved of any bad feelings you might have if you just add a paragraph or two of "self flagellation" prose. And I kind of hope you don't feel pleasure for writing down how these things are bad for the planet.

I don’t think LLMs are a good thing for the world. […] This technology will help contribute to a worse world for most people.

Alright let's go through this:

I think they concentrate power to those with capital, I think they will “increase throughput” for folks in ways that will give fewer people jobs and will force higher inter-class competition in culturally unhealthy ways like a concentrated version of the gig economy. If you want to see the epitome of “worst person you know” meme, Tucker Carlson’s point about what the social cost of driverless trucking looks like (0m-3m, perhaps open in a private browser to avoid poisoning your algorithms) is something which is always in the back of my mind.

Maybe it's just me, but actually getting into the nitty gritty of all these examples and what they suggest for societal dynamics is very interesting to me. This sketches a picture of a society driven by technology (and therefore technologists), which steers economy, which in turn influences culture. There is a pattern here that gets repeated with the driverless trucks example: The cost of tech ultimately will cost society more than it saves (or makes money). I'll also add that it seems to me that technology itself has a hidden driver which is economy.

I believe in the dead internet theory and think that using LLMs to make slop which looks like it was authored by a human is unethical.

Somewhat disconnected from this is the point about the dead internet. Production of cultural artifacts should never be substituted by AI-generated slop. Hiding the fact that something was generated by LLMs is unethical. I imagine he was thinking about social media bots and similar things here.

I think LLMs are assisting to push the Overton window towards authoritarianism by making it easier to spread disinformation, to create false narratives and to reduce agency in favour of algorithms.

The point about the Overton window is, as noted above, very interesting to me. The fact that LLMs make it easier to spread misinformation, etc. is in a way baked into the technology. It's much easier to produce kind of plausible sounding things than to produce actual facts[3] that hold up to scrutiny - if there is even anyone left doing the leg work. I am hesitant to insinuate that right wingers "don't care". I think they do care about different things. There are idiots in their ranks. And some of them wield power, sure. But not all of them are idiots and some that wield power aren't either. But indeed, public discourse is but a chess board to them, it seems. Credibility within the established system does not matter in the same way as it does to progressives. I wonder about the end goal sometimes. If I had to guess: Remaking society definitely seems to be it. Since that is the case subverting and degrading all democratic institutions from science to journalism and politics itself seems to just make sense. To that end prompting an AI to drown the web in slop and use AI to create chat bots multiplying their voice changes society. Intellectuals on the right will know that this is (part of the) play book and find actual discourse and real credibility elsewhere, I imagine.

I think LLMs are assisting [...] to reduce agency in favour of algorithms.

The point about agency is interesting but separate, I think. This is more about the owners of the algorithms and not so much about users of the tech for nefarious reasons. This point therefore cuts deeper. It posits: Whoever owns the algorithm - I will fill in here: for search, recommendations, suggestions for what to prompt the AI with, how the AI is trained and functions, etc. - basically controls some of the most important interfaces through which we make sense of the world around us today. LLMs in particular could be used to elegantly hide certain search results, by effectively "training out" certain viewpoints as relevant to a query. More generally though: How stuff is weighted and what is chosen to be displayed to the user is at least in part the result of the rules that make up "the algorithm" of the big technologies we all use today and rely on daily. And since it's often easier and more fun to ask an LLM than, say, a traditional search engine (or a library catalog before that), LLMs do indeed assist in furthering our conception of the world through (privately owned) algorithms that do some of the filtering - maybe even some of the filtering we wouldn't want them to do - and decision making for us.[4]

Some of this stuff could be OK if we had wide-scale social safety nets and a healthy labour class, but I don’t think something like UBI is coming, and I think LLMs are going to continue to push economic growth over planet-wide health.

This is a somewhat perplexing point. I think this relates to "AI"-induced job loss, mostly. Or at least I would hope so. LLMs will certainly be used to try to push economic growth, I agree with that and it is maybe not completely wrong to say that LLMs themselves will do that, but the driver here is clearly greed of the industry and not the tech in itself. The tech is persuasive and conjures in some people visions (delusions) of omnipotence and since that's the case, some will try to push for economic growth so that it may finance furthering the technology. But in the end the main driver is greed and LLMs are just the money producing vehicle du jour.

[…] For a pull request which is really touching all of the internal guts of an engine, being able to trivially generate a useful and well described test suite whenever you want is worth the price of Claude Code’s most expensive offering to me. Easy.

What I think is important to keep in mind is that this is a tool, and not a replacement for critical thinking and taking responsibility. If you’re writing production code, regardless of its source you need to be treating every line as though you wrote it and present it as code which you fully understand and can handle feedback on when it comes to code review. I’m on-call 12 hours a day every day for this code.

Emphasis mine. This is a part from the guts of the post, which is quite technical, but signals - to me at least - that Claude Code is pretty good at this stuff.[5] For some reason - maybe because this person talks about maintainability and testing and refactoring - this post seems more credible to me than many others.

I think it’s likely that Claude Code will be one tool of many which I will find a place where it fits instead of being a broad replacement for writing code on a day-to-day.

Personally, I am intrigued to figure out if Claude Code could actually work as well for me - I am working mostly on legacy PHP code bases[6] - as it worked for the author here. I expect it would be harder. I think adding it to the tool box is not a bad thing per-se, though. Although getting used to using LLMs and using them more also means more environmental impact.[7]

On a final note, there is an interesting tension in both “it’s likely that Claude Code will be one tool of many…” and “I don’t think LLMs are a good thing for the world.” I think the core difference here is scale, it’s like ensuring you are always turning off your house lights to save power, only to look out of the window at a bunch of commercial properties which never bother, individual choice can matter but it’s a drop in the bucket compared to say making a law encouraging self-driving cars.

I like how the first article ends, because it talks about individual use which is a good framework to judge these technologies by. On a grand scale there is really not a lot you can do to further or prevent LLM adoption. You can however use these tools smartly and with scrutiny. You are not free to not give a shit, just because you know the state of the world.

Okay. On to the second article.

Claude Code has considerably changed my relationship to writing and maintaining code at scale. […] Claude Code has decoupled myself from writing every line of code, I still consider myself fully responsible for everything I ship to Puzzmo, but the ability to instantly create a whole scene instead of going line by line, word by word is incredibly powerful. […] I believe with Claude Code, we are at the “introduction of photography” period of programming. Painting by hand just doesn’t have the same appeal anymore when a single concept can just appear and you shape it into the thing you want with your code review and editing skills.

The main point here is the "introduction of photography" comment. This brings Walter Benjamin to mind and his incredibly deep essay "Art In The Age Of Mechanical Reproduction". Programming itself is an uncomfortable medium to claim hand-maded-ness because it's hard to posit an "aura" emanating from "hand-written code" - what does that even mean, exactly? - that may be lost if it's now merely generated. Still, I could see people making this argument. And indeed in more private spaces, critiques will say that they like the practice of programming and LLMs devalue the craft. Similar arguments have been made against photography and video cameras and all the rest of society shifting technologies.

Sorry. I didn’t make it so nearly all of culture’s changes are bad, and I think that LLMs are already doing social damage and will do much worse in the future - but this genie is fully out of the bottle and it is substantially going to change what we think of as programming.

This is said with some of the same energy as the self flagellation line. The author seems to think you can be absolved, by pointing this out. If this is the future of programming, then I think we ought to at least start thinking what this means for the next generations of programming.[8]

[a long list of impressive sounding improvements...] For clarity in the back because this is shocking to me, while I was still working on the existing roadmap I had prior to Claude Code over the last 6 weeks, I accomplished all of these things on my own. Mostly in the background (and then with a polish pass day for some of the larger ones). I didn’t go from working ~10 hour days to working ~16 hours or anything like that either.

This was years of “tech debt” / “tech innovation” backlog for me! Done in just over a month and a half.

If you understand what you are doing, the capacity for building and handling the breadth of tasks which typically live within the remit of “technical debt” do not need to be treated as debt and you can just do it as you are working on other things.

If this is true, that using Claude Code to work on tech debt "on the side" is basically free, than this would indeed be great. I have a hard time believing this scales/generalizes to other stacks and an even harder time to imagine what this looks like when dealing with a team. Although I appreciate that this is a very capable dev it is still just one dev with an opinion about how code should look like. As the lead, it surely is easier to do this on the side. But this completely loses sight that technical debt is at least in part always also political.

[…] I think it’s worth noting here that we offered Claude Code to everyone on the team from the moment it because[sic] obvious how powerful of a tool it was for me personally.

I would say from our team, the sort of people who have used and found value the most are people with both product, technical skills and agency to feel like they can try things.

I think this is an important caveat. If you're not empowered to try things, then Claude Code - or any other agent for that matter - may actually not help you as much. This raises the question who didn't feel empowered and why didn't they, doesn't it?

A monorepo is perfect for working with an LLM[...] This isn’t novel work. Most of the stuff we’re doing on a day to day basis is pretty normal down-to-earth CRUD style apps.

I think this is more and more what I notice: The stack and setup of your system and what it tries to do for the user has a lot to do with the success of using LLMs (especially agents) to code.

There was a recent paper (which is from the pre-Claude Code days) which says that developers with AI over-estimate its impact and maybe I am.

Doesn’t feel it though. Did you see that list at the top? I feel like I’m constantly trashing my usual estimation timings to the point where it’s hard for me to gauge how long a task will take.

The recent paper in question. It will stay very difficult to quantify the gains of using LLMs at work. Measuring knowledge workers productivity in general is already hard. Throwing LLMs in the mix doesn't make it any easier. As long as they are a subjectively helpful tool it doesn't really matter too much. They don't even need to make the work any quicker, imho. Making it easier/safer to change code, or more enjoyable could also be metrics (that would be just as hard to measure).

Parallel Construction for Juniors

One thing I’ve talked with folks earlier in their careers who want to still be doing a lot of the grunt work themselves is to consider writing their work, then comparing their results to the same requested by Claude Code.

Parallel construction is a way to have your cake and eat it. You’re still learning and growing, but your results can be honed by seeing how the aggregate of training data in Claude Code has also decided to do it. Claude Code likely has a deeper ecosystem understanding, and may read more source code in the current codebase as well as knowing abstractions you’ve still not learned.

IMO treating it as a competitor which you can learn from is a much healthier alternative to either giving up and just accepting you don’t need to know stuff anymore, or putting your head in the sand and assuming somehow this change won’t affect you.

I thought this was a very thoughtful part of the article. Doing it yourself and then comparing results is what I have been doing to learn, too. And it's not that dissimilar from something like codewars.

Going further: I think that the result of a well working software development environment is a well working software development team.[9] It's not output or throughput or whatever. I don't know if anyone has thought this through in this way: So if Claude Code and similar agents and other LLM tools will be part of the tool kit: How do you onboard juniors or just new colleagues in general? If the skills to evaluate output are bound by your ability to do it by hand, how do you grow workers who know the craft well enough to even work with these tools safely? How do you empower them?

However, pragmatically, as a pairing partner with an experienced engineer constantly reviewing, amending and understanding the output - you can really treat Claude like a Pair Programming buddy with infinite time and patience, a bit too much sycophancy, and the ability to ship reasonable code given reasonable constraints in a speed I’ve not seen before.

And that is like a new way to build things.

When I first came across these posts, they were presented as just another disappointing example of how things are going to shit. Another app lost to vibe coding. I obviously don't think everything written in these two posts is thought through completely, but I also didn't have the feeling this is just bs. It's at least much less bs than another post that made the rounds a while ago that I also found interesting if less carefully written.[10] To me it seemed that Orta Therox knows a lot and cares a lot - maybe less of doing every aspect of software development by hand - but definitely about the craft in general. And say what you will, but having these perspectives is valuable. I appreciated that he shared chat transcripts and also full-blown PRs.[11] This is the kind of vulnerable writing that I would like to see more of. I have seen a lot of sceptics just dismissing the whole topic and a lot of boosters that don't care about the craft. But reading this actually made me feel like you can be interested in the craft and still find LLMs and how they are going to influence the thing you care about interesting without sounding like you're crazy. The end.

P.S.: This commentary article has been going on for about 5000 words now. There would be so much more to say, but I think I got most of what I wanted to say out. I think these topics are important and demand care and openness from all of us. Be that as people living in a society or as developers being part of a community of craftspeople: We owe it to the world we co-create to care.


  1. Apologies for substack. ↩︎

  2. I lost the nerve to finish it, because I felt vulnerable around it. At least I'll quote my main point from it here... ↩︎

  3. As a trained historian of science with a Latourian bend, I always flinch a little at that word. But almost nobody knows what a matter of concern even is. Just know that "Matters of fact are only very partial and, I would argue, very polemical, very political renderings of matters of concern and only a subset of what could also be called states of affairs." (taken from the linked PDF, p. 232) ↩︎

  4. From a ANT perspective: seeing more of the world often means having more means of probing and understanding. More tools or more ways to perceive is a good thing. Adding LLMs to the mix of tools from that perspective is not a bad thing in itself. It is when ChatGPT or whathaveyou becomes not only your only means to engage with the world, but when it becomes your world full stop. ↩︎

  5. I personally have never used Claude Code. Codex and Copilot Agent Mode are the agents that I have used and was sorely disappointed by (so far?). ↩︎

  6. Mostly our in-house CMS/Shop hybrid based on an older Symfony version. ↩︎

  7. I have still yet to see a good rebutal of [this Article](this Article) and other articles by the same person around this topic of the relative harmlessness of LLM-based water and electricity usage. The person writing this, Andy Masely, is strongly associated with the effective altruism movement, which I have mixed feelings about and this is a substack. Appologies for that. ↩︎

  8. In fairness: He does say something about this later. I'll comment on it more there. ↩︎

  9. I think this can be transposed into good software development should lead to better developers not quicker delivery. If you're a solo dev or don't work in a team, for example. Same spirit. ↩︎

  10. Though: As an anthropologist of the moderns, I don't need to agree with every piece of writing to deem it interesting. ↩︎

  11. Making it obvious that no one actually reviewed the PR, but that's besides the point. ↩︎