this post was submitted on 24 Nov 2023
1 points (100.0% liked)
LocalLLaMA
3 readers
1 users here now
Community to discuss about Llama, the family of large language models created by Meta AI.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
What do you want to happen when the total chat reaches 8k? Because there the server has to make a choice it can keep adding more context so it slows down, it can simply cut off the first messages but then it will for example forget its own name, or it could for example (this is a method I use but it costs interference time as you ask a 2nd question behind the scenes) ask the model to summarize the first 4K of the context so it will retain some context and still retain speed.