That's really interesting, thank for sharing!!
How do the querying process work for this 'knowledge graph"?
That's really interesting, thank for sharing!!
How do the querying process work for this 'knowledge graph"?
That's a great work!
Just a question... Have anyone tried to fine tune one of those "Frankenstein" models? Even on a small dataset...
Some time ago (when one tf the first experimental "Frankenstein" came out, it was a ~20B model) I read here on reddit that lots of users agreed that a fine tune on those merged models would have "better" results since it would help to "smooth" and adapt the merged layers. Probably I lack the technical knowledge needed to understand, so I'm asking...
Still really curious about a full fine tune on one of those Frankenstein models... What are the vram requirements?
tried to look into the langchain repo, honestly i couldnt understand anything in there :/
A classic...
Is there a way to implement self querying retrieval
Can you explain what do you mean with self query? Do you mean the use of a llm to generate the query for retrivial?
Is that a LORA or a full fine tune?
That's a great work!
Just a question... Have anyone tried to fine tune one of those "Frankenstein" models? Some time ago (when the first "Frankenstein" came out, it was a ~20B model) I read here on reddit that lots of users agreed that a fine tune on those merged models would have "better" results since it would help to "smooth" and adapt the merged layers. Probably I lack the technical knowledge needed to understand, so I'm asking...
Do you have an alternative version for chain of thoughts?
How is possible that Llama2 13B and 7B have lower hallucination rate than Claude?
Where is the LLM hosted?
Someone compared that with Claude 2 100K?
Also, gpt4 32K have same 100% accuracy in all its context? Is that 64 on 180 "absolute" or relative?
I'm wondering what that approach could generate if applied to codellama 34b.
A Frankenstein 2x 34B model may be more easy to test, and we have 70B model for reference.... Also, imo code generation is a good way to test the behavior of the models and to discriminate some lucky results that "sounds right".
Thank you for you answer! I've worked hard to improve my personal RAG implementation, searching (and asking here) ad nauseam to find ways to enhance the performance of the retrivial process...
i will study over this approach linked in the OP post, and your answer really helped me to take everything to a more "practical / tangibile" level.
I'll try to integrate that on my experimental pipeline (currently I'm stable on RAG fusion using "query expansion" and hybrid search using transformer, SPLADE and bm25.
i already tried an approach that need a LLM to iterate over every chunk before generating embedding, mainly to solve pronouns and cross reference between chunks.... Good results... But not good enough if analyzed in relation to the resource needed to iterate the llm over every item. Maybe the integration of this "knowledge nodes/edges generation" in my "llm" pre processing will change the pro/cons balance since, from a rapid test, the model seem able to do both text preprocessing and concept extraction in the same run.
Thanks again!
.
I had many good discussions on this sub, and I really like that community... Anyway, i got your point Lol.