I run 7B’s on my 1070. ollama run llama2 produces between 20 and 30 tokens per second in ubuntu.
Anyone know the largest model size that will fit into the new M3 Pro with 36GB RAM? I am looking to run some 23GB models with long context.
Did the AI coordinate his sacking?
I run 7B’s on my 1070. ollama run llama2 produces between 20 and 30 tokens per second in ubuntu.