TheCrazyAcademic

joined 1 year ago
[โ€“] TheCrazyAcademic@alien.top 1 points 11 months ago

It'd be interesting to see how an MoE framework of multiple Orca 2s each trained on different subsets of data basically routing your prompt to different orca 2 experts would fair. I feel like that can come extraordinarily close to a GPT 4 in performance metrics but would take decent computing power to test the hypothesis. If each orca 2 expert is 10 billion parameters and you wanted to run a 100 billion sparse orca 2 MoE that's gonna require at least 500 gig+ of VRAM at minimum.

Well people with mutations like megacephaly which is an enlarged brain aren't any smarter and somehow become even dumber because it messes with neuronal density so we know brain size does not correlate to intelligence at all. Animals with bigger brains meaning more neurons then humans aren't smarter at least in theory, scientists could just be using bad benchmarks.