this post was submitted on 01 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I'm exploring techniques to improve memory handling in LLMs without resorting to vector databases like Pinecone. In the scenario of an ongoing conversation of days or weeks in length, previous chats roll off the context window. The idea would be for a conversation manager (could be the LLM prompting itself as space fills up) to allocate space of a pre-set ratio within the context window for storing memories.

2 techniques I've thought about:

- Memory hierarchization based on keyword, timestamp, or subjective importance scores

- Text compression via various techniques such as syntactic/semantic shrinking, tokenization, substitution, etc.

Certainly this has been achieved before. Any experience with it?

top 1 comments
sorted by: hot top controversial new old
[–] AsliReddington@alien.top 1 points 1 year ago

All you need a 32K LLM. Everything beyond that needs a tool invocation where the archived texts can be pulled from. You'll have to make your orchestrator smart enough to know that there is content beyond just needs to be invoked