overview for dqUu3QlS

An Alternative Approach to Building Generative AI Models in c/localllama@poweruser.forum

[–] dqUu3QlS@alien.top 1 points 2 years ago

A large technical disadvantage: I think we need a new type of precision cutting tool to extract and recognize shapes inside tensor weight images

Why do you think we need this? To me, it just indicates that the structure of Stable Diffusion is designed for real-world photos, artwork, and diagrams, and ill-suited for predicting the weights of an LLM.

the poc shows today’s models can predict new weights without training and without entity extraction/ml and within 13-30 seconds the output is are not dramatically horrible vs the original source weights.

Are you sure the output isn't dramatically horrible? To me the predicted weight images look nothing like the original weight images. The fine detail is completely different.

But it doesn't even matter how it looks to human eyes. What matters is, when a new model is constructed from the predicted weights, whether that model makes mostly-correct predictions.

An Alternative Approach to Building Generative AI Models in c/localllama@poweruser.forum

[–] dqUu3QlS@alien.top 1 points 2 years ago (3 children)

If I understand this correctly, you're using a smaller NN to predict the weights of a larger one? Have you tested to make sure this approach preserves the performance of the larger model? What advantage does your approach have compared to existing approaches - distillation, quantization, pruning, just training smaller models directly?

I can think of some clear disadvantages for performance.

Questions on Attention Sinks and Their Usage in LLM Models in c/localllama@poweruser.forum

[–] dqUu3QlS@alien.top 1 points 2 years ago (1 children)

What's the question?