this post was submitted on 18 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
top 1 comments
sorted by: hot top controversial new old
[–] Thistleknot@alien.top 1 points 10 months ago

I was going to try to knowledge distill but they modified their tokenizer.

Either way neo has a 125M model, so a 248M model is x2 that. I imagine this could be useful for shorter context tasks. Idk, or to continue training for very tight uses cases

I came across it while looking for tiny mistral config jsons to replicate⁸

https://preview.redd.it/l9l7a39u3a1c1.jpeg?width=720&format=pjpg&auto=webp&s=80589cb6fbb2268b0d8af65b4ec27647185b4780