this post was submitted on 11 Sep 2023

92 points (68.3% liked)

Technology

75577 readers

1611 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

AI chatbots were tasked to run a tech company. They built software in under seven minutes — for less than $1. (www.businessinsider.com)

submitted 2 years ago by shish_mish@lemmy.world to c/technology@lemmy.world

85 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] BombOmOm@lemmy.world 178 points 2 years ago (4 children)

The difficult part of software development has always been the continuing support. Did the chatbot setup a versioning system, a build system, a backup system, a ticketing system, unit tests, and help docs for users. Did it get a conflicting request from two different customers and intelligently resolve them? Was it given a vague problem description that it then had to get on a call with the customer to figure out and hunt down what the customer actually wanted before devising/implementing a solution?

This is the expensive part of software development. Hiring an outsourced, low-tier programmer for almost nothing has always been possible, the low-tier programmer being slightly cheaper doesn't change the game in any meaningful way.

[–] Knusper@feddit.de 12 points 2 years ago (3 children)

Yeah, I'm already quite content, if I know upfront that our customer's goal does not violate the laws of physics.

Obviously, there's also devs who code more run-of-the-mill stuff, like yet another business webpage, but those are still coded anew (and not just copy-pasted), because customers have different and complex requirements. So, even those are still quite a bit more complex than designing just any Gomoku game.

[–] NoRodent@lemmy.world 8 points 2 years ago

I’m already quite content, if I know upfront that our customer’s goal does not violate the laws of physics.

Haha, this is so true and I don't even work in IT. For me there's bonus points if the customer's initial idea is solvable within Euclidean geometry.

load more comments (2 replies)

[–] Puzzle_Sluts_4Ever@lemmy.world 8 points 2 years ago* (last edited 2 years ago) (1 children)

While I do agree that management is genuinely important in software dev:

If you can rewrite the codebase quickly enough, versioning matters a lot less. Its the idea of "is it faster to just rewrite this function/package than to debug it?" but at a much larger scale. And while I would be concerned about regressions from full rewrites of the code... have you ever used software? Regressions happen near constantly even with proper version control and testing...

As for testing and documentation: This is actually what AI-enhanced tools are good for today. These are the simple tasks you give to junior staff.

Conflicting requests and iterating on descriptions: Have you ever futzed around with chatgpt? That is what it lives off of. Ask a question, then ask a follow up question, and so forth.

I am still skeptical of having no humans in the loop. But all of this is very plausible even with today's technology and training sets.

Just to add a bit more to that. I don't think having an AI operated company is a good idea. Even ignoring the legal aspects of it, there is a lot of value to having a human who can make irrational decisions because one customer will pay more in the long run and so forth.

But I can definitely see entire departments being a node in a rack. Customers talk to humans (or a different LLM) which then talk to the "Network Stack" node and the "UI/UX" node and so forth.

[–] Vlyn@lemmy.zip 10 points 2 years ago (11 children)

If you just let it do a full rewrite again and again, what protects against breaking changes in the API? Software doesn't exist in a vacuum, there might be other businesses or people using a certain API and relying on it. A breaking change could be as simple as the same endpoint now being named slightly differently.

So if you now start to mark every API method as "please no breaking changes for this" at what point do you need a full software developer again to take care of the AI?

I've also never seen AI modify an existing code base, it's always new code getting spit out (80% correct or so, it likes to hallucinate functions that don't even exist). Sure, for run of the mill templates you can use it, but even a developer who told me on here they rely heavily on ChatGPT said they need to verify all the code it spits out, because sometimes it's garbage.

In the end it's a damn language model that uses probability on what the next word should be. It's fantastic for what it does, but it has no consistent internal logic and the way it works it never will.

load more comments (11 replies)

[–] doublejay1999@lemmy.world 4 points 2 years ago

Which is why plenty of companies merely pay lip service to it, or don’t do it at all and outsource it to ‘communities’

load more comments (1 replies)

[+] Melco@lemmy.world 96 points 2 years ago* (last edited 2 years ago) (4 children)

[deleted]

[–] Nougat@kbin.social 49 points 2 years ago (4 children)

I've tried to have ChatGPT help me out with some Powershell, and it consistently wanted me to use cmdlets which do not exist for on premise Exchange. I told it as much, it apologized, and wanted me to use cmdlets that don't exist at all.

Large Language Models are not Artificial Intelligence.

[–] dojan@lemmy.world 9 points 2 years ago (1 children)

I had a weird XAML error I didn’t quite get, and the LLM gave me BS solutions before giving me back my original code.

load more comments (2 replies)

[–] lilShalom@lemmy.basedcount.com 16 points 2 years ago

Ive had google bard supply me code to use with a google api url that doesnt exist.

[–] thorbot@lemmy.world 4 points 2 years ago

This also completely glosses over the fact that AI capable of writing this had huge R&D costs to get to that point and also have ongoing costs associated with running them. This whole article is a fucking joke, probably written by AI

[–] aard@kyu.de 3 points 2 years ago

You meant to say "a competent human", which a lot of programmers are not.

While I'd expect this to be of rather low quality I'd bet money on having seen worse projects done by actual humans in the last 25 years.

[–] flamekhan@lemmy.world 83 points 2 years ago

"We asked a Chat Bot to solve a problem that already has a solution and it did ok."

[–] doublejay1999@lemmy.world 65 points 2 years ago (1 children)

Plot twist - the AI just cut and paste from stack overflow like real devs.

[–] breadsmasher@lemmy.world 59 points 2 years ago (1 children)

It cost less than a dollar to run all those chatbots?

Doubt

load more comments (1 replies)

[–] igorlogius@lemmy.world 54 points 2 years ago* (last edited 2 years ago) (2 children)

Do managment next and lets see who's gonna be replaced first

[–] thanks_shakey_snake@lemmy.ca 5 points 2 years ago (1 children)

They did do management-- They modeled the whole company as individual "staff" communicating with each other: CEO-bot communicates a product direction to the CTO-bot who communicates technical requirements to the developer-bot who asks for a "beautiful user interface" (lol) from the "art designer" (lol).

It's all super rudimentary and goofy, but management was definitely part of the experiment.

[–] igorlogius@lemmy.world 3 points 2 years ago (1 children)

Sorry, my mistake i kind of misunderstood ... but now I wonder which part of the "company" was most easy to replace and where the most and least failure rate/processing was located/necessary.

[–] thanks_shakey_snake@lemmy.ca 3 points 2 years ago

It was testing that the code worked, of course :) That was the only place that had human intervention, other than a) providing the initial prompt, and b) providing icons and stuff for the GUI, instead of using generated ones. That was the "get out of jail free" card:

In cases where an interpreter struggles with identifying fine-grained logical issues, the involvement of a human client in software testing becomes optional. CHATDEV enables the human client to provide feedback and suggestions in natural language, similar to a reviewer or tester, using black-box testing or other strategies.

load more comments (1 replies)

[–] scarabic@lemmy.world 46 points 2 years ago

A test that doesn’t include a real commercial trial or A/B test with real human customers means nothing. Put their game in the App Store and tell us how it performs. We don’t care that it shat out code that compiled successfully. Did it produce something real and usable or just gibberish that passed 86% of its own internal unit tests, which were also gibberish?

[–] Pistcow@lemm.ee 37 points 2 years ago (2 children)

But did it work?

[–] ArbiterXero@lemmy.world 56 points 2 years ago (4 children)

As someone that uses ChatGPT daily for boilerplate code because it’s super helpful…

I call complete bullshite

The program here will be “hello world” or something like that.

[–] LazaroFilm@lemmy.world 26 points 2 years ago* (last edited 2 years ago) (1 children)

Absolutely I can create a code for your app.

void myApp(void) {
  // add the code for your app here
  return true;
}

You may need to change the code above to fit your needs. Make sure you replace the comment with the proper code for your app to work.

[–] whileloop@lemmy.world 18 points 2 years ago (1 children)

Couldn't even write a void method right, return true!

[–] LazaroFilm@lemmy.world 3 points 2 years ago

LMAO. At list it didn’t sudo void… (:

[–] ipha@lemm.ee 18 points 2 years ago (1 children)

"hello world" as a service?

[–] SpaceNoodle@lemmy.world 4 points 2 years ago

https://github.com/salvatorecordiano/hello-world-as-a-service

[–] Semi-Hemi-Demigod@kbin.social 6 points 2 years ago (2 children)

It's great for things like "How do I write this kind of loop in this language" but when I asked it for something more complex like a class or a big-ish function it hallucinates. But it makes for a very fast way to get up to speed in a new language

[–] SpaceNoodle@lemmy.world 3 points 2 years ago (10 children)

So just a little more time-consuming than just reading the online documentation.

load more comments (10 replies)

load more comments (1 replies)

[–] scarabic@lemmy.world 11 points 2 years ago

And how long did it take to compose the “assignments?” Humans can work with less precise instructions than machines, usually, and improvise or solve problems along the way or at least sense when a problem should be flagged for escalation and review.

[–] kitonthenet@kbin.social 17 points 2 years ago* (last edited 2 years ago) (2 children)

At the designing stage, the CEO asked the CTO to "propose a concrete programming language" that would "satisfy the new user's demand," to which the CTO responded with Python. In turn, the CEO said, "Great!" and explained that the programming language's "simplicity and readability make it a popular choice for beginners and experienced developers alike."

I find it extremely funny that project managers are the ones chatbots have learned to immitate perfectly, they already were doing the robot’s work: saying impressive sounding things that are actually borderline gibberish

[–] thanks_shakey_snake@lemmy.ca 7 points 2 years ago

What does it even mean for a programming language to "satisfy the new user's demand?" Like when has the user ever cared whether your app is built in Python or Ruby or Common Lisp?

It's like "what notebook do I need to buy to pass my exams," or "what kind of car do I need to make sure I get to work on time?"

Yet I'm 100% certain that real human executives have had equivalent conversations.

load more comments (1 replies)

[–] Knusper@feddit.de 15 points 2 years ago

the CTO responded with Python. In turn, the CEO said, "Great!" and explained that the programming language's "simplicity and readability make it a popular choice for beginners and experienced developers alike."

Yep, that does sound like my CEO.

[–] theluddite@lemmy.ml 11 points 2 years ago (3 children)

"I gave an LLM a wildly oversimplified version of a complex human task and it did pretty well"

For how long will we be forced to endure different versions of the same article?

The study said 86.66% of the generated software systems were "executed flawlessly."

Like I said yesterday, in a post celebrating how ChatGPT can do medical questions with less than 80% accuracy, that is trash. A company with absolute shit code still has virtually all of it "execute flawlessly." Whether or not code executes it not the bar by which we judge it.

Even if it were to hit 100%, which it does not, there's so much more to making things than this obviously oversimplified simulation of a tech company. Real engineering involves getting people in a room, managing stakeholders, navigating conflicting desires from different stakeholders, getting to know the human beings who need a problem solved, and so on.

LLMs are not capable of this kind of meaningful collaboration, despite all this hype.

load more comments (3 replies)

[–] gencha@feddit.de 9 points 2 years ago (1 children)

What a load of bullshit. If you have a group of researchers provide "minimal human input" to a bunch of LLMs to produce a laughable program like tic-tac-toe, then please just STFU or at least don't tell us it cost $1. This doesn't even have the efficiency of a Google search. This AI hype needs to die quick

[–] blazera@kbin.social 9 points 2 years ago (1 children)

Researchers, for example, tasked ChatDev to "design a basic Gomoku game," an abstract strategy board game also known as "Five in a Row."

What tech company is making Connect Four as their business model?

load more comments (1 replies)

[–] atzanteol@sh.itjust.works 8 points 2 years ago (1 children)

This research seems to be more focused on whether the bots would interoperate in different roles to coordinate on a task than about creating the actual software. The idea is to reduce "halucinations" by providing each bot a more specific task.

The paper goes into more about this:

Similar to hallucinations encountered when using LLMs for natural language querying, directly generating entire software systems using LLMs can result in severe code hallucinations, such as incomplete implementation, missing dependencies, and undiscovered bugs. These hallucinations may stem from the lack of specificity in the task and the absence of cross-examination in decision- making. To address these limitations, as Figure 1 shows, we establish a virtual chat -powered software tech nology company – CHATDEV, which comprises of recruited agents from diverse social identities, such as chief officers, professional programmers, test engineers, and art designers. When presented with a task, the diverse agents at CHATDEV collaborate to develop a required software, including an executable system, environmental guidelines, and user manuals. This paradigm revolves around leveraging large language models as the core thinking component, enabling the agents to simulate the entire software development process, circumventing the need for additional model training and mitigating undesirable code hallucinations to some extent.

[–] turmacar@kbin.social 4 points 2 years ago

I assume the endgame of this is the boardroom suggestion ~~guy~~ bot asking "is this based on real facts? / does this actually function?"

[–] autotldr@lemmings.world 7 points 2 years ago

This is the best summary I could come up with:

AI chatbots like OpenAI's ChatGPT can operate a software company in a quick, cost-effective manner with minimal human intervention, a new study has found.

Based on the waterfall model — a sequential approach to creating software — the company was broken down into four different stages, in chronological order: designing, coding, testing, and documenting.

After assigning ChatDev 70 different tasks, the study found that the AI-powered company was able to complete the full software development process "in under seven minutes at a cost of less than one dollar," on average — all while identifying and troubleshooting "potential vulnerabilities" through its "memory" and "self-reflection" capabilities.

"Our experimental results demonstrate the efficiency and cost-effectiveness of the automated software development process driven by CHATDEV," the researchers wrote in the paper.

The study's findings highlight one of the many ways powerful generative AI technologies like ChatGPT can perform specific job functions.

Nevertheless, the study isn't perfect: Researchers identified limitations, such as errors and biases in the language models, that could cause issues in the creation of software.

The original article contains 639 words, the summary contains 172 words. Saved 73%. I'm a bot and I'm open source!

load more comments