riskable

joined 2 years ago
[–] riskable@programming.dev 4 points 1 month ago (1 children)

Working on (some) AI stuff professionally, the open source models are the only models that allow you to change the system prompt. Basically, that means that only open source models are acceptable for a whole lot of business logic.

Another thing to consider: There's models that are designed for processing: It's hard to explain but stuff like Qwen 3 "embedding" is made for in/out usage in automation situations:

https://huggingface.co/Qwen/Qwen3-Embedding-8B

You can't do that effectively with the big AI models (as much as Anthropic would argue otherwise... It's too expensive and risky to send all your data to a cloud provider in most automation situations).

[–] riskable@programming.dev 11 points 1 month ago (9 children)

This doesn't make sense when you look at it from the perspective of open source models. They exist and they're fantastic. They also get better just as quickly as the big AI company services.

IMHO, the open source models will ultimately what pops the big AI bubble.

[–] riskable@programming.dev 13 points 1 month ago (1 children)

Stick Enthusiasts!

[–] riskable@programming.dev 2 points 1 month ago* (last edited 1 month ago)

No, a .safetensors file is not a database. You can't query a .safetensors file and there's nothing like ACID compliance (it's read-only).

Imagine a JSON file that has only keys and values in it where both the keys and the values are floating point numbers. It's basically gibberish until you go through an inference process and start feeding random numbers through it (over and over again, whittling it all down until you get a result that matches the prompt to a specified degree).

How do the "turbo" models work to get a great result after one step? I have no idea. That's like black magic to me haha.

[–] riskable@programming.dev 4 points 1 month ago (3 children)

Or, with AI image gen, it knows that when some one asks it for an image of a hand holding a pencil, it looks at all the artwork in it's training database and says, "this collection of pixels is probably what they want".

This is incorrect. Generative image models don't contain databases of artwork. If they did, they would be the most amazing fucking compression technology, ever.

As an example model, FLUX.dev is 23.8GB:

https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main

It's a general-use model that can generate basically anything you want. It's not perfect and it's not the latest & greatest AI image generation model, but it's a great example because anyone can download it and run it locally on their own PC (and get vastly superior results than ChatGPT's DALL-E model).

If you examine the data inside the model, you'll see a bunch of metadata headers and then an enormous array of arrays of floating point values. Stuff like, [0.01645, 0.67235, ...]. That is what a generative image AI model uses to make images. There's no database to speak of.

When training an image model, you need to download millions upon millions of public images from the Internet and run them through their paces against an actual database like ImageNET. ImageNET contains lots of metadata about millions of images such as their URL, bounding boxes around parts of the image, and keywords associated with those bounding boxes.

The training is mostly a linear process. So the images never really get loaded into an database, they just get read along with their metadata into a GPU where it performs some Machine Learning stuff to generate some arrays of floating point values. Those values ultimately will end up in the model file.

It's actually a lot more complicated than that (there's pretraining steps and classifiers and verification/safety stuff and more) but that's the gist of it.

I see soooo many people who think image AI generation is literally pulling pixels out of existing images but that's not how it works at all. It's not even remotely how it works.

When an image model is being trained, any given image might modify one of those floating point values by like ±0.01. That's it. That's all it does when it trains on a specific image.

I often rant about where this process goes wrong and how it can result in images that look way too much like some specific images in training data but that's a flaw, not a feature. It's something that every image model has to deal with and will improve over time.

At the heart of every AI image generation is a random number generator. Sometimes you'll get something similar to an original work. Especially if you generate thousands and thousands of images. That doesn't mean the model itself was engineered to do that. Also: A lot of that kind of problem happens in the inference step but that's a really complicated topic...

[–] riskable@programming.dev 5 points 1 month ago

I'm ok with rich people getting charged more. But anyone who isn't making like $1 million/year should get the normal price.

[–] riskable@programming.dev 60 points 1 month ago (7 children)

This will definitely encourage more people to have kids.

[–] riskable@programming.dev -3 points 1 month ago (1 children)

It's close enough. The key is that it's not something that was just regurgitated based on a single keyword. It's unique.

I could've generated hundreds and I bet a few would look a lot more like a banana.

[–] riskable@programming.dev -3 points 1 month ago* (last edited 1 month ago) (3 children)

Hard disagree. You just have to describe the shape and colors of the banana and maybe give it some dimensions. Here's an example:

A hyper-realistic studio photograph of a single, elongated organic object resting on a wooden surface. The object is curved into a gentle crescent arc and features a smooth, waxy, vibrant yellow skin. It has distinct longitudinal ridges running its length, giving it a soft-edged pentagonal cross-section. The bottom end tapers to a small, dark, organic nub, while the top end extends into a thick, fibrous, greenish-brown stalk that appears to have been cut from a larger cluster. The yellow surface has minute brown speckles indicating ripeness.

It's a lot of description but you've got 4096 tokens to play with so why not?

Remember: AI is just a method for giving instructions to a computer. If you give it enough details, it can do the thing at least some of the time (also remember that at the heart of every gen AI model is a RNG).

A terrible image of a banana generated by AI using a prompt that did not use the word banana

Note: That was the first try and I didn't even use the word "banana".

[–] riskable@programming.dev -2 points 1 month ago

It's more like this: If you give a machine instructions to construct or do something, is the end result a creative work?

If I design a vase (using nothing but code) that's meant to be 3D printed, does that count as a creative work?

https://imgur.com/bdxnr27

That vase was made using code (literally just text) I wrote in OpenSCAD. The model file is the result of the code I wrote and the physical object is the output of the 3D printer that I built. The pretty filament was store-bought, however.

If giving a machine instructions doesn't count as a creative process then programming doesn't count either. Because that's all you're doing when you feed a prompt to an AI: Giving it instructions. It's just the latest tech for giving instructions to machines.

[–] riskable@programming.dev 1 points 1 month ago

Like I said initially, how do we legally define "cloning"? I don't think it's possible to write a law that prevents it without also creating vastly more unintended consequences (and problems).

Let's take a step back for a moment to think about a more fundamental question: Do people even have the right to NOT have their voice cloned? To me, that is impersonation; which is perfectly legal (in the US). As long as you don't make claims that it's the actual person. That is, if you impersonate someone, you can't claim it's actually that person. Because that would be fraud.

In the US—as far as I know—it's perfectly legal to clone someone's voice and use it however TF you want. What you can't do is claim that it's actually that person because that would be akin to a false endorsement.

Realistically—from what I know about human voices—this is probably fine. Voice clones aren't that good. The most effective method is to clone a voice and use it in a voice changer, using a voice actor that can mimick the original person's accent and inflection. But even that has flaws that a trained ear will pick up.

Ethically speaking, there's really nothing wrong with cloning a voice. Because—from an ethics standpoint—it is N/A: There's no impact. It's meaningless; just a different way of speaking or singing.

It feels like it might be bad to sing a song using something like Taylor Swift's voice but in reality it'll have no impact on her or her music-related business.

view more: ‹ prev next ›