I was thinking of distributed MoEs as well.Question I have is how do you route queries? I don't know how to do that if all the Es are in the same cluster let alone distrivuted.
yeah its a work in progress. Its not trivial to setup . it's easy to imagine a way it could be done , but it all has to be built, tested, refined.
llama cpp is out there, I am a c++ person but I dont have deep experience with LLMs (how to fine tune etc) generally and have other projects in progress. but if you look around in the usual places with some search terms you'll find the attempts in progress, and they probably could use volunteers.
my aspirations are more toward the vision side, I'm a graphics person and need to get on with producing synthetic data or something
some people want to train on procedural generators (eg game engine) which would be in C++. being able to have the whole codebase in one language would smooth this out. (in my case I have a rust 3d engine code base that i'd like to use to drive AI )
ggml is a great idea IMO.