My two cents. Unless you have jargon in your field you just need RAG. Fine-tuning is for adjusting the token associations to predict the most likely next token.
knownboyofno
joined 1 year ago
What do you define as "decent" tokens per second? Do you have a budget yet? Do you want to run the 13B at full precision or a quantized precision?
Have you tried superbooga? That link talks about it but you have to enable it in your text webui extensions.
White is the normal generation while the blue is the look ahead.
Have you set up the system message with those exact requirements? The system message is sent along with each prompt if I am not mistaken.
Are you on the same OS when you run it?
You have to do some type of search for the correct information for the question. It doesn't matter how you get the information.