overview for nuvalab

Why is a single a100 so slow? in c/localllama@poweruser.forum

[–] nuvalab@alien.top 1 points 11 months ago

That sounds like CPU speed. What you see from `watch nvidia-smi -d -n 0.1` while you're running inference ?

Proposed Alternative to Repetition Penalty - Noisy Sampling in c/localllama@poweruser.forum

[–] nuvalab@alien.top 1 points 11 months ago

Thanks for writing this, it's an interesting idea and very relevant to the issue that I am trying to solve too - creative writing, which definitely hates repetition, and very interested to try out what you proposed once it's available :)

One technical question for this approach: Wouldn't it change the original distribution of training data / output, specially in case where there is one and obviously good one next token to choose from? I can see the value when multiple next tokens are all considered great with close probability, but curious how would it behave otherwise in terms of consistency in correctness.

Is LLaMA-1-65B or LLaMA-2-70B more creative at storytelling ? in c/localllama@poweruser.forum

[–] nuvalab@alien.top 1 points 11 months ago (1 children)

That's an interesting idea .. in my experience anything <1 works, >1.2 goes wild and for things we expect to be a bit more deterministic, setting it to 0 is preferred.

What's your best setup and temperature for creative writing ?

1

Is LLaMA-1-65B or LLaMA-2-70B more creative at storytelling ? (alien.top)

submitted 11 months ago by nuvalab@alien.top to c/localllama@poweruser.forum

5 comments fedilink

I recently started using the base model of LLaMA-2-70B for creative writing and surprisingly found most of my prompts from ChatGPT actually works for the "base model" too, suggesting it might also be fine tuned a bit on ChatGPT-like instructions.

Curious anyone tried both llama 1 & 2 base model and can share their experiences on creativity ? My hunch is llama 1 might be slightly better at it, assuming it hasn't go through as much alignment.