this post was submitted on 29 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

I have been playing with llms for novel writing. Thus far all I have been able to use them for is brainstorming. No matter the model I use the prose feels wooden, dull, and obviously AI.

Is anyone else doing this? Are there particular models that work really well or any prompts you recommend? Any workflow advice you have to better leverage llms in any way would be very appreciated!

top 10 comments
sorted by: hot top controversial new old
[–] kindacognizant@alien.top 1 points 9 months ago (1 children)

Play with your sampler settings. The impact in creativity changes pretty significantly.

See this, for example:

https://preview.redd.it/yg9jg6r4f93c1.png?width=595&format=png&auto=webp&s=f5f38dd788a60439bf83693dd67cbdef25bbe7d2

The important elements are:

- Min P, which sets a minimum % relative to the top probability token. Go no lower than 0.03 for coherence at higher temps.

- Temperature, which controls how much the smaller probability options are considered and makes them more probable.

[–] ambient_temp_xeno@alien.top 1 points 9 months ago (1 children)

I agree. I have these for yichat34 --top-k 0 --min-p 0.05 --top-p 1.0 --color -t 5 --temp 3 --repeat_penalty 1 -c 4096 -i -n -1

I think the --min-p I have is a bit low, so maybe you have the min-p back to front? Lower is more precise I think.

[–] pseudonerv@alien.top 1 points 9 months ago

--top-k 0 is the same as --top-k 1, so fully deterministic, no?

[–] Dry-Judgment4242@alien.top 1 points 9 months ago

Goliath 120b is the only model so far I've tried that doesn't ChatGTP out on me. The 3.0 quant fits on 2x 3900rtx.

[–] Dry-Judgment4242@alien.top 1 points 9 months ago

Goliath 120b is the only model I tried so far that is not infested with GTP prose.

[–] A0sanitycomp@alien.top 1 points 9 months ago

What models are you using? I’ve had no luck with anything. Actually that orca-mini 3b is good at writing things matter-of-factly but it doesn’t go into great detail about anything.

[–] thereisonlythedance@alien.top 1 points 9 months ago (1 children)

Out of the box, I actually find the vanilla Llama-2 70b chat model produces the most natural prose, if prompted correctly. Long Alpaca 70b is also good at following style if you feed it a chunk of writing.

But the best results I’ve had have come from fine-tuning Mistral 7B myself. Mistral writes crazy good if trained right, though can get muddled at longer contexts.

[–] AstronomerChance5093@alien.top 1 points 9 months ago (1 children)

would you mind going into more detail in your fine tuning methods? your dataset, how it's structured etc. I'm trying to get something similar going with mistral atm, but not having much luck getting anything good out of it.

[–] thereisonlythedance@alien.top 1 points 9 months ago (1 children)

Sure.

I'm using an instruct style dataset with a system field (in Axolotl I use either the orcamini dataset type or chatml). I've then collated a bunch of writing that I like (up to 4096 tokens in length) and then reverse prompted it in an LLM to create instructions. So, for example, one sample might have a system field that is "You are a professional author with a raw, visceral writing style" or "You are an AI built for storytelling." Then the instruction might be "write a short story about X that touches on themes of Y and Z, write in the style of W." Or the instruction might be a more detailed template, setting out genre, plot, characters, scene description, POV, etc. Then the response is the actual piece. My dataset also includes some contemporary non-rhyming poetry, some editing/rephrasing samples, and some literary analysis.

I have three datasets. A small one that is purely top quality writing in a dataset structured as above, a middle sized one that also works in some fiction-focused synthetic GPT-4 data I've generated myself and curated from other datasets, and a larger one that also incorporates conversational responses derived from a dataset that is entirely Claude generated.

I've then run a full fine-tune on Mistral with those datasets using Axolotl on RunPod, using either 2 or 3 A100s.

I find utilising a system prompt very beneficial -- it seems to help build associative connections.

Overall results have been pretty good. The larger dataset model is a great all round writer and still generalises well. The smaller dataset model produces writing that is literary, verbose, and pretty.

I've also had some success training on Zephyr as a base model. It helps to give underlying structure and coherence. Finding the right balance of writing pretty and long, with enough underlying reasoning to sustain coherence has been the key challenge for me.

[–] AstronomerChance5093@alien.top 1 points 9 months ago

Thank you for such a detailed response - really helpful!