overview for dnsod

What All Dropped Recently: in c/localllama@poweruser.forum

[–] dnsod_si666@alien.top 1 points 2 years ago (1 children)

RWKV looks awesome

How much does Quantization actually impact models? - KL Divergence Tests in c/localllama@poweruser.forum

[–] dnsod_si666@alien.top 1 points 2 years ago

You could also use this to measure different models against each other right? And just in general, use this as a model benchmark.

Get dataset of text.
Tokenize dataset.
Measure true probabilities straight from the dataset.
Train model number 1 on tokenized dataset.
Measure KL divergence of model from true probabilities.
Repeat steps 4,5 for model number 2
Compare KL divergence of model 1 to model 2.

-Separate Idea- Also isn’t getting the true probabilities useful anyway, because then we could have the training process be:

Get dataset.
Tokenize.
Get true probabilities.
Train on probabilities instead of directly on the tokens.

Like instead of training twice (sequence to probabilities):

sequence1 -> [1, 0]
sequence1 -> [0, 1] You train it once with:
sequence1 -> [0.5, 0.5]

So you are training on less data which would reduce training costs and whatnot.

Your settings are (probably) hurting your model - Why sampler settings matter in c/localllama@poweruser.forum

[–] dnsod_si666@alien.top 1 points 2 years ago

This may be a dumb question, but why do we use any sampling modifications at all? Is that not defeating the purpose of the model training to learn those probabilities?