Some of the content also seems to allude to what Q* might be…
lakolda
GPT-4 turbo only speeds things up by 3x…
This isn’t comparing with the 13B version of LLAVA. I’d be curious to see that.
In context learning allows the model to learn new skills to a limited degree.
GPT-3.5 turbo apparently has 20 billion parameters, significantly less than the previous best Phind models. Given how bad GPT-3.5 is, I think it was more likely just fine tuned some other base model on GPT-3.5 outputs.
The original LLMZip paper mainly focused on text compression. A later work (I forget the name) used an LLM trained on byte tokens. This allowed it to compress not just text, but any file format. I think it may have been Google who published that particular paper… Very impressive though.
LLMZip achieves SOTA compression by a large margin.
Ever hear the term might?