davidmezzetti

joined 2 years ago
[–] davidmezzetti@alien.top 1 points 2 years ago

I haven't found one that is universally best regardless of the benchmarks. Same story with vector embeddings, you'll need to test a few out for your own use case.

The best one I've found for my projects though is https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca and the AWQ implementation https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-AWQ.

[–] davidmezzetti@alien.top 1 points 2 years ago

Yes, if you build an embeddings database with your documents. There are a ton of examples available: https://github.com/neuml/txtai

[–] davidmezzetti@alien.top 1 points 2 years ago

It works with GPTQ models as well, just need to install AutoGPTQ.

You would need to replace the LLM pipeline with llama.cpp for it to work with GGUF models.

See this page for more: https://huggingface.co/docs/transformers/main_classes/quantization

[–] davidmezzetti@alien.top 1 points 2 years ago

Thank you, appreciate it.

I have a company (NeuML) in which I provide paid consulting services through.

[–] davidmezzetti@alien.top 1 points 2 years ago

Well for RAG, the GitHub repo and it's documentation would need to be added to the Embeddings index. Then probably would want a code focused Mistral finetune.

I've been meaning to write an example notebook that does this for the txtai GitHub report and documentation. I'll share that back when it's available.

[–] davidmezzetti@alien.top 1 points 2 years ago

This code uses txtai, the txtai-wikipedia embeddings database and Mistral-7B-OpenOrca-AWQ to build a RAG pipeline in a couple lines of code.

 
[–] davidmezzetti@alien.top 1 points 2 years ago

Thank you, glad to hear it.

[–] davidmezzetti@alien.top 1 points 2 years ago

This is an application that connects a vector database and LLM to perform RAG. The logic is written in Python and available as a local API service.