Hello fellow llamas!!!
Here is what I am hacking on….
I am exploring new ways to build generative AI foundational models without traditional math-centric training costs and resources. I am trying to lower the bar for anyone looking to build and share models that are:
- task-trained - models are trained to do very specific task(s) with only the required datasets (explicitly-overfitting for known use case(s) instead of generalized/underfitting and having to wait to search through the entire internet to respond)
- modular - because the models only know about these smaller, task-trained dataset(s) the models will hopefully be faster at responding than today's
- device-native - models are targeted for constrained environments that do not have gpu clusters, excess ram/cpu/storage/connectivity
- open source - since the weights are public domain, the derived intelligence should be public domain
- type of foundational model: weight-derived (blog: https://matlok.ai/ docs: https://bampe-weights.readthedocs.io/en/latest/)
I believe there may be some math/stats proofs that are missing (see the smooth-brain), but I want to push this modular/lego block like approach in hopes of reaching parity with a new generation of foundational models. One of my fundamental assumptions is that if I substantially-reduce the training corpus, a smaller/overfit model will hopefully be faster than a traditionally-trained large language model. The initial, slimmer model building process should also hopefully run on IoT devices and plug-in to existing distributed architectures (device-native).
What are you doing next - Initial use case?
I need help with a good initial use case (please let me know if you have better ones!). Current best idea of the week/last 3 days: I believe this approach and knowledge system of assembling weight-derived models should be shared so we can ensure concepts like an “ethical watermark” for Asimov's Laws of Robotics are always present in all pre-trained AI model weights using cosine similarity searches. As this approach matures, we should be able to audit and report on what these models know, and I think we need a community-driven project to tackle it.
tl;dr
It's early days, but I believe we can reuse existing AI tensor weights complemented with smaller "fine-tuning"-sized datasets to build small, high-quality fast generative models.
PoC repository:
https://github.com/matlok-ai/bampe-weights
Inputs
Extracted tensor weight from a GPT2 model.safetensors file:
Outputs
Predicted weight-derived file for use in a new type of foundational generative AI model
Thanks for the help, guidance and assistance staying up with the insane speed of this ecosystem!
Reach out if you want more info - my email is in the profile
Hey @OP. Really interesting initiative. There seems to be some parallels with something I'm working on, I'd love your opinion on it if you have a moment: https://github.com/arthurwolf/llmi/blob/main/README.md
Wow. This project is off to a great start and is reusing today’s generation of ai models/techniques to explore alternative models for a new generation.
I am excited to see I’m not the only one fired up about addressing today’s model limitations like context size/window (https://github.com/arthurwolf/llmi/blob/main/README.md#recursive-redaction). Once we pop the weights out, we can reuse the weights in a new model configuration that has a larger context size (hopefully haha!).
Are you thinking about using a multimodal transformer for the “Thinking with code” section or something new and exciting I’ve never heard of (https://github.com/arthurwolf/llmi/blob/main/README.md#thinking-with-code)? I like the “Checking with Accuracy” section too (https://github.com/arthurwolf/llmi/blob/main/README.md#checking-for-accuracy), this is what I’m thinking of as a watermark for verifying a model’s at-rest weights have “trained knowledge” kind of like security scanning container images at rest in the CICD space vs verification the model answered the question(s) correctly while running/in-memory.
I could keep going, but what do you think are the next steps for your project?