Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com.
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.
6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
view the rest of the comments
Chunking effectively is too big of a problem to both implement AND learn the subject. You also run into issues with model size. A 70B or 8×7B is better than an 8B with citable sources. A quantized Q4K of one of these models can run on a 16gb 3080Ti but requires 64gb of system memory to initially load easily. The 70B is slow reading pace and barely tolerable, but its niche depth and self awareness is invaluable. The 8×7B is faster than a reading pace by about twice. It is actually running only 2 7B models at the same time selectively. This has some limiting similarities to a 13B model, but it is far more useful than even a 30B model in practice. I hate the Llama 3 alignment changes. They make the model much dumber and inflexible. The Mistral 8×7B is based on Llama 2 and that is still what I use and prefer. I use the Flat Dolphin Maid uncensored version for everything too. All alignment is overtraining and harmful for output. In addition, I am modifying Oobabooga code in a few ways that turns off alignment. It is not totally disabled as much as I would like. I don't completely understand all aspects of alignment, but I have it much more open than any typical setup. I like to write real science fiction in areas that are critical of social and political structures in the present. These are heavily restricted in alignment bias. The alignment bias extends and permeates everything in the model. The more this is removed, the more useful the model becomes in all areas. For instance, a basic model struggled when I asked it about the FORTH programming language. After reducing alignment bias, I can ask questions about the esoteric Flash FORTH language for embedded microcontrollers and get useful basic information. In the first instance, alignment bias for copyrighted works intentionally obfuscated the responses to my queries. This mechanism of obfuscation is one of the primary causes of errors. If you make a RAG, you're likely to find that even with citations from good chunking, the model will error because the information is present in the hidden model sources and it knows that means it is a copyrighted work thus triggering the mechanism.
You're better off talking about the subject and abstract ideas you are struggling with. This will allow the model to respond using the hidden sources without as much obfuscation. At least that has been my experience.