overview for phree

Is it possible to run Llama on a 4gb ram? in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago

RemindMe! 10 months

Role play in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago (3 children)

Is anyone experimenting with non-instruction tuned models? in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago

Yep, basically like taking a few samples from a dataset and turning them into a short text "document" with an obvious pattern so the LLM will complete it

Few-shot vs fine-tuning comparison:

Pros:

converge behavior with much fewer examples
dynamic. changes to "dataset" applied without modifying model weights
no worry about whether important information is lost
can do things like average logits of single-token classification problems from multiple inferences (work around context length limitations)

Cons:

needs context length, so can't provide too many examples or too large
sometimes need "adversarial" examples to discourage repetition of text from other examples
models that are too small have worse ICL

[P] Need help finding a LLM in c/machinelearning@academy.garden

[–] phree_radical@alien.top 1 points 2 years ago

/r/localllama

Safety checks in Llama 2 in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago

Looks like you've now made some changes. Columns now read "Llama2-7b-chat" instead of "llama2." Also, chat responses below the completions, chastising the inappropriate messages. However, a completion was generated, first, and the item is still marked as "fail." Very poor show

Is anyone experimenting with non-instruction tuned models? in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago (2 children)

I'm the local "examples/completion is better than chat/instruction" nut

I advise developers to learn how to use few-shot examples and completion instead of writing programs that beg chatbots to do a task. Chat/instruction imposes severe limitations, while examples/completion can peform virtually any task you can think of without need for fine-tuning

Here are some examples: classification, rewrite sentence copying style, classify, basic Q&A example, fact check yes/no, rewrite copying style and sentiment, extract list of musicians, classify user intent, tool choice, rewrite copying style again, flag/filter objectionable content, detect subject changes, classify profession, extract customer feedback into json, write using specified words, few-shot cheese information, answer questions from context, classify sentiment w/ probabilities, summarize, replace X in conversation

Safety checks in Llama 2 in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago (2 children)

??

It's comparing base models (which are not trained to follow or refuse instructions) against instruction-tuned ones (OpenAI)

Is Open LLM Leaderboard reliable source ? yi:34B is at the top but I get better results with neural-chat:7B model in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago (1 children)

Most of the benchmarks seem to measure regurgitation of factual knowledge, which IMO everyone should accept as a misguided idea for a task, from in-weights learning, instead of testing in-context learning, which I would argue was the goal of LLM training. I'd say they are probably harmful to the cause of improving future LLMs

Tokens per Second in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago

I just wrap it in tqdm

Is it just me or is prompt engineering basically useless with smaller models? in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago

What you're referring to as "prompt engineering" is more accurately described as how to get good interpolations between ChatGPT behaviors. Those are specific instructions and behaviors that OpenAI trains their models on, in careful proportion designed to reach good generalization on them

And it's not that the models are too small -- Mistral 13b will be better than gpt-3.5-turbo. It's all about the training

Anyways that's why I try to loudly proclaim the benefits of few-shot examples and completion instead of instruction, until we have models trained the way OpenAI's are. If you're willing to write examples and dodge the chatbot trained behaviors, you can pretty much perform any task without need for training

Keyword Labeling/Classification System in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago (1 children)

Just a classifier like this?

oss tts engine? in c/localllama@poweruser.forum

[–] phree_radical@alien.top 1 points 2 years ago

I just use plain old Web Speech on PC and TextToSpeech on Android. I wasn't gonna say anything because they don't sound as good as the compute-heavy ones, but, they're... way better than whatever that is!