this post was submitted on 29 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

AlphaGo's premise is that instead of using human feedback for reinforcement learning, you instead have the model play games against itself, with a simple reward mechanism, so that it can learn from its own mistakes. This achieves scalability of the training data, allowing the model to discover new Go moves and eventually exceed the quality of its initial training data.

From an engineering point of view, how do you see this applied to other areas like software development, where there is no opponent player? Do you connect the model to a compiler, and have it learn by trial and error based on compiler output? Do you set desired software outcomes and have another AI evaluate how much closer or farther the output is with each iteration? How would this closed feedback loop work to get an AI to become a world expert in a specific programming language or framework?

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here