overview for Appropriate-Tax-9585

What kind of specs to run local llm and serve to say up to 20-50 users in c/localllama@poweruser.forum

[–] Appropriate-Tax-9585@alien.top 1 points 11 months ago

Thank you, this is really good to hear!

What kind of specs to run local llm and serve to say up to 20-50 users in c/localllama@poweruser.forum

[–] Appropriate-Tax-9585@alien.top 1 points 11 months ago

At the moment I’m just trying to grasp the basics, like for example what kind of GPUS I will need and how many. This is more for comparison to SaaS options, however in reality I need to setup a server for testing with just few users. I’m going to research into but I like this community and to hear others view on the case as many have tried to manage their own servers I imagine :)

1

What kind of specs to run local llm and serve to say up to 20-50 users (alien.top)

submitted 11 months ago by Appropriate-Tax-9585@alien.top to c/localllama@poweruser.forum

10 comments fedilink

Hi all,

Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.

Any discussion is welcome:)

1

Local Rag/embedding clarifications (alien.top)

submitted 11 months ago by Appropriate-Tax-9585@alien.top to c/localllama@poweruser.forum

1 comments fedilink

Hi all, I posted originally to langchain sub but didn’t get any response yet, could anyone give some pointers, thanks.

Basic workflow for questioning data locally?

Hi all,

I’m using lang chain js, and most examples I find are using openAI but I’m using llama. I managed to get a simple text file embedded and can ask basic questions, but most of the time the model just spits out the prompt.

I’m using just cpu at the moment so it’s very slow but that’s ok. I’m experimenting with loading txt files, csv files etc but clearly it’s not going well, I can ask some very simple question but most of the time it fails.

My understanding is;

Load model
Load data and chunk (csv file for example. I chunk usually with something like 200 and by separators /n
Load embedding (I’m supposed to load llama gguf model right? The same one as in step 1? As a parameter in llamaCppEmbeddings)
Vector store in memory
Create chain and ask question
Console log answer

Is this concept correct and do you have any tips to help me get better results.

Thank you