Someone13574

joined 1 year ago
[โ€“] Someone13574@alien.top 1 points 1 year ago

A small model like Zephyr and some few shot prompting should work pretty well. Is it always asking the same question or is it looking for a different thing each time?

[โ€“] Someone13574@alien.top 1 points 1 year ago

While there are some techniques for sub-quadratic transformer scaling (Longformer, Longnet, Retentive Network, and others), there haven't really been any state of the art models trained with these techniques and they cannot be applied to existing models, so they aren't really being used. As for why these techniques aren't being used in new models, that is either because they harm performance too much, or are considered to be too risky and more proven architectures are used instead. If there were models trained using these techniques and they were good, then people would use them, but for now there just isn't any models actually using them, so we are stuck with O(n^2) computation and memory.