I'm exploring techniques to improve memory handling in LLMs without resorting to vector databases like Pinecone. In the scenario of an ongoing conversation of days or weeks in length, previous chats roll off the context window. The idea would be for a conversation manager (could be the LLM prompting itself as space fills up) to allocate space of a pre-set ratio within the context window for storing memories.
2 techniques I've thought about:
- Memory hierarchization based on keyword, timestamp, or subjective importance scores
- Text compression via various techniques such as syntactic/semantic shrinking, tokenization, substitution, etc.
Certainly this has been achieved before. Any experience with it?