this post was submitted on 28 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Although for inferencing, memory bandwidth is the most important, FLOPS still matter. APUs are just too slow, so the bottleneck will get shifted to calculating all those matrix operations (provided there's high bandwidth designed for APUS like Apple which I doubt so)