tail-recursion

joined 11 months ago
 

I want to use an open source LLM as a RAG agent that also has memory of the current conversation (and eventually I want to work up to memory of previous conversations). I was looking into conversational retrieval agents from Langchain (linked below), but it seems they only work with OpenAI models. Is it possible to get an open source LLM to work with RAG and conversational memory using Langchain?

https://python.langchain.com/docs/use_cases/question_answering/conversational_retrieval_agents

 

I have been trying to get open source models to work with Langchain tools. So far the only model that has worked has been Llama 2 70b Q4 following James Briggs tutorial. Both Llama 2 13b and Mistral 7b Instruct use the tool correctly, observe the answer, but then return an empty string at the end as the output, whereas Llama 2 70b returns "It looks like the answer is X".

I want to experiment with Qwen 14b as it is a relatively small model that may be more efficient to run than Llama 2 70b to see if it works with Langchain tools etc. I read on the GitHub page for Qwen 14b that it was trained specifically for tool usage so I feel like it is one of the most promising models. That and there was quite a lot of positive sentiment about it on this sub.

When I try to load Qwen 14b on my Mac M1 I am getting an error related to auto-gptq, when I tried to install auto-gptq with pip it errors and mentions something about CUDA. Does auto-gptq work on Mac OS or does it require CUDA? Is there any way to get some version of Qwen 14b to run on Mac OS?

Has anyone experimented with Qwen 14b and Langchain tool usage?

Does anyone have any suggestions for models smaller than Llama 2 70b that might work for Langchain tool usage?

[–] tail-recursion@alien.top 1 points 10 months ago

Got it working with Llama 2 70b following the tutorial from James Briggs. Note it did not work with Llama 2 13b which returned an empty output at the end. https://stackoverflow.com/questions/77491941/llama-2-with-langchain-tools

 

Defog's SQLCoder is a state-of-the-art LLM for converting natural language questions to SQL queries.

SQLCoder-34B is a 34B parameter model that outperforms gpt-4 and gpt-4-turbo for natural language to SQL generation tasks on our sql-eval framework, and significantly outperforms all popular open-source models.

https://huggingface.co/defog/sqlcoder-34b-alpha

SQLCoder-34B is fine-tuned on a base CodeLlama model.

Results on novel datasets not seen in training

model perc_correct

defog-sqlcoder-34b 84.0%

gpt4-turbo-2023-11-09 82.5%

gpt4-2023-11-09 82.5%

defog-sqlcoder2 77.5%

gpt4-2023-08-28 74.0%

defog-sqlcoder-7b 71.0%

gpt-3.5-2023-10-04 66.0%

claude-2 64.5%

gpt-3.5-2023-08-28 61.0%

claude_instant_1 61.0%

text-davinci-003 52.5%

Defog was trained on more than 20,000 human-curated questions. These questions were based on 10 different schemas. None of the schemas in the training data were included in our evaluation framework.

You can read more about our training approach and evaluation framework.

SQLCoder-34B has been tested on a 4xA10 GPU with float16 weights. You can also load an 8-bit and 4-bit quantized version of the model on consumer GPUs with 20GB or more of memory – like RTX 4090, RTX 3090, and Apple M2 Pro, M2 Max, or M2 Ultra Chips with 20GB or more of memory.

[–] tail-recursion@alien.top 1 points 10 months ago

Tried following this with Llama 2 13b

https://www.pinecone.io/learn/llama-2/

I get "ValueError: unknown format from LLM: "

 

Has anyone been able to get ANY open source LLM to use Langchain tools? I have not had success with any of the models I have tried including Llama 2, Mistral and Yi 34b. I usually get “Cannot parse LLM output” type errors. In some cases the model successfully uses the tool but doesn’t return the final answer correctly i.e the model invokes the tool correctly and I can see the answer as an observation but the model doesn’t return the answer correctly.

In my application the answer from the tool will have a specific format that should make it easy to extract by looking at the observations and extracting using regex (assuming I can access the observations).

But I’m wondering if anyone has had any success with ANY open source LLM in using Langchain tools where the model can correctly use the tool and return the final answer without erroring?

[–] tail-recursion@alien.top 1 points 10 months ago

You could try an open source LLM like Llama 2. You could probably use Langchain tools to give it a tool to tag when a tweet has harmful content.