Appropriate-Tax-9585

joined 11 months ago
[–] Appropriate-Tax-9585@alien.top 1 points 11 months ago

Thank you, this is really good to hear!

[–] Appropriate-Tax-9585@alien.top 1 points 11 months ago

At the moment I’m just trying to grasp the basics, like for example what kind of GPUS I will need and how many. This is more for comparison to SaaS options, however in reality I need to setup a server for testing with just few users. I’m going to research into but I like this community and to hear others view on the case as many have tried to manage their own servers I imagine :)

 

Hi all,

Just curious if anybody knows the power required to make a llama server which can serve multiple users at once.

Any discussion is welcome:)

 

Hi all, I posted originally to langchain sub but didn’t get any response yet, could anyone give some pointers, thanks.

Basic workflow for questioning data locally?

Hi all,

I’m using lang chain js, and most examples I find are using openAI but I’m using llama. I managed to get a simple text file embedded and can ask basic questions, but most of the time the model just spits out the prompt.

I’m using just cpu at the moment so it’s very slow but that’s ok. I’m experimenting with loading txt files, csv files etc but clearly it’s not going well, I can ask some very simple question but most of the time it fails.

My understanding is;

  1. Load model
  2. Load data and chunk (csv file for example. I chunk usually with something like 200 and by separators /n
  3. Load embedding (I’m supposed to load llama gguf model right? The same one as in step 1? As a parameter in llamaCppEmbeddings)
  4. Vector store in memory
  5. Create chain and ask question
  6. Console log answer

Is this concept correct and do you have any tips to help me get better results.

Thank you