nuvalab

joined 11 months ago
[–] nuvalab@alien.top 1 points 11 months ago

That sounds like CPU speed. What you see from `watch nvidia-smi -d -n 0.1` while you're running inference ?

[–] nuvalab@alien.top 1 points 11 months ago

Thanks for writing this, it's an interesting idea and very relevant to the issue that I am trying to solve too - creative writing, which definitely hates repetition, and very interested to try out what you proposed once it's available :)

One technical question for this approach: Wouldn't it change the original distribution of training data / output, specially in case where there is one and obviously good one next token to choose from? I can see the value when multiple next tokens are all considered great with close probability, but curious how would it behave otherwise in terms of consistency in correctness.

[–] nuvalab@alien.top 1 points 11 months ago (1 children)

That's an interesting idea .. in my experience anything <1 works, >1.2 goes wild and for things we expect to be a bit more deterministic, setting it to 0 is preferred.

What's your best setup and temperature for creative writing ?

 

I recently started using the base model of LLaMA-2-70B for creative writing and surprisingly found most of my prompts from ChatGPT actually works for the "base model" too, suggesting it might also be fine tuned a bit on ChatGPT-like instructions.

Curious anyone tried both llama 1 & 2 base model and can share their experiences on creativity ? My hunch is llama 1 might be slightly better at it, assuming it hasn't go through as much alignment.