Programming

23835 readers

97 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

UlrikHD@programming.dev

A Month of Chat-Oriented Programming - CheckEagle (checkeagle.com)

submitted 1 week ago by codeinabox@programming.dev to c/programming@programming.dev

16 comments fedilink hide all child comments

TL;DR: I spent a solid month “pair programming” with Claude Code, trying to suspend disbelief and adopt a this-will-be-productive mindset. More specifically, I got Claude to write well over 99% of the code produced during the month. I found the experience infuriating, unpleasant, and stressful before even worrying about its energy impact. Ideally, I would prefer not to do it again for at least a year or two. The only problem with that is that it “worked”. It’s hard to know exactly how well, but I (“we”) definitely produced far more than I would have been able to do unassisted, probably at higher quality, and with a fair number of pretty good tests (about 1500). Against my expectation going in, I have changed my mind. I now believe chat-oriented programming (“CHOP”) can work today, if your tolerance for pain is high enough.

you are viewing a single comment's thread
view the rest of the comments

[–] TehPers@beehaw.org 14 points 6 days ago (1 children)

1500 tests is a lot. That doesn't mean anything if the tests aren't testing the right thing.

My experience was that it generates tests for the sake of generating them. Some are good. Many are useless. Without a good understanding of what it's generating, you have no way of knowing which are good and which are useless.

It ended up being faster for me to just learn the testing libraries and write my own tests. That way I was sure every test served a purpose and tested the right thing.

[–] kamstrup@programming.dev 6 points 6 days ago (1 children)

Yeah. Totally agree on this. I spend maybe 3-4h a day reviewing code, and these are my thoughts....

The LLM generated tests I see are generally of very low quality. Perfectly fitting the bill of looking like a test, but not actually being a good test.

They often don't test the precise expected value. As an overly simplistic example: They rarely check 2+2==4. But just assert 2+2>0, or often just that 2+2 doesn't cause an error.

The tests often contain mountains of redundancy. Again, an oversimplified example: They have a test for 2+2, and another for 2+3.

There is never any attempt to make the tests nice to read for humans. It is always just heaps of boilerplate code. No helpers introduced, or affordances to simplify test setup.

Coupling the proclivity for boilerplate together with subtly redundant tests makes for some very poor programming. Worse than I'd expect from a junior, tbh.

And 1500 tests... That is not necessarily a lot! If that is the output of 1 month of pumping out code, I would say bare minimum

[–] majster@lemmy.zip 4 points 6 days ago (2 children)

30×50=1500, 50 tests per day is a lot. That is a lot to read and understand all the edge cases, let alone writing them.

[–] kamstrup@programming.dev 1 points 2 days ago

That depends on what you count as a "test". In some langs/frameworks it is a lot, indeed.

[–] TehPers@beehaw.org 1 points 6 days ago* (last edited 6 days ago)

30 is assuming you write code for all 30 days. In practice, it's closer to 20, so 75 tests per day. It's doable on some days for sure (if we include parameterized tests), but I don't strictly write code everyday either.

Still, I agree with them that you generally want to write a lot of tests, but volume is less important than quality and thoroughness. The author using the volume alone as a meaningful metric is nonsense.