AnomalyNexus

joined 1 year ago
[–] AnomalyNexus@alien.top 1 points 11 months ago (1 children)

3G/s and a SSD might be able to saturate it f you're lucky.

The one is bits the other is bytes ;)

Network...3 gigabits, while a decent nvme gen 4 can do 4-5 gigabytes

Even old SATA connected SSDs should be able to keep up if you don't buy trash.

[–] AnomalyNexus@alien.top 1 points 11 months ago (2 children)

It’s vaguely how the Mac’s work.

The current APUs are still quite slow but maybe it’ll change. Also in most cases you need to designate memory as gpu specific. So not quite shared

[–] AnomalyNexus@alien.top 1 points 11 months ago

That's a sharp comment.

Potentially beyond my technical ability but I can vaguely see where you're going with it.

Next step was embeddings anyway (hence attempt to clean the data - get it ready for that).

I've not heard of pagerank applied to this before though. Thanks!

 

Some of the bigger/better models make me think local is doing pretty well and it is at chat, but exploring data cleaning has taken a bit of wind out of my sail.

Not having much luck with the ones I've tried (think 34B Q5 of various flavours - all the usual suspects).

Say I've got a paragraph about something and the text block contains some other unrelated comment. Let's say "subscribe to our news letter" in it or some other web scraping artifact. I'd like to give the LLM an instruction to filter out content not related to the paragraph topic.

Local LLMs...mostly failing. GPT3.5...failing I'd say 40% of the time. GPT4...usually works...call it 90.

That's not entirely surprising, but the degree to which locals are failing at this task relative to closed is frustrating me a bit.

Hell for some 34Bs I can't even get the local ones to surpress the opening

Here's the cleaned article:

...when the prompt literally says word for word don't include that. Are there specific LLMs for this? Or is my prompting just bad?

You are an expert at data cleaning. Given a piece of text you clean it up by removing artifacts left over from webscraping. Remove anything that doesn't seem related to the topic of the article. For example you must remove links to external sites, image descriptions, suggestions to read other articles etc. Clean it up. Remove sentences that are not primarily in English. Keep the majority of the article. The article is between the [START] and [END] marker. Don't include [START] or [END] in your response. It is important that there is no additional explanation or narrative added - just respond with the cleaned article. Do not start your response with "Here's the cleaned article:"

Unrelated - openai guidance says use """ as markers not the start/end I've got. Anybody know if that is true for locals?

[–] AnomalyNexus@alien.top 1 points 11 months ago

Proxmox backup server on HyperV

Saves me an extra device basically.

Occasionally WSL for AI stuff but it’s annoyingly fragile frankly.

[–] AnomalyNexus@alien.top 1 points 11 months ago (4 children)

Expecting a minor revolution on the intersection of /r/selfhosted /r/LocalLLaMA and /r/homeassistant

The self-hosted AI tech is slowly but surely getting to a stage where it could pull all of this together.

What required siri/alexa last year will soon be on /r/selfhosted turf

[–] AnomalyNexus@alien.top 1 points 11 months ago

Worth noting that mx renewals honor BF pricing.

I'm still cruising on my 2019 era BF deal lol...

[–] AnomalyNexus@alien.top 1 points 11 months ago (1 children)

I was looking at their policies, and I am worried about the "Forbidden Services"

Bulk email providers need pretty tight and aggressively worded ToS by necessity because they're a target for spammers & abuse. The owner of mxroute has been around on various forum for years & consistently strikes me as a very reasonable bloke that won't cause you problems if you don't cause him problems.

I have 5 domains... Any idea what "massive numbers" means?

Maybe I'm imagining this so please don't quote me on this but I vaguely recall hearing them answering a question around what constitutes reasonable in the context of unlimited domains as "if you needed a script/automation to create them then you're probably over the line".

[Note that this is purely my impression as long term customer & I have no special insight/connection to them. Legally they can enforce the ToS]

[–] AnomalyNexus@alien.top 1 points 11 months ago (1 children)

What is the intended use case? At 10s/token I’d imagine not chat

Swapping out layers on the fly is an interesting approach though

[–] AnomalyNexus@alien.top 1 points 11 months ago

There is also the issue of pcie slots. Currently running a second card in a x4 slot and it’s noticeably slower. Getting four full speed x16 slots is going to be some pretty specialised equipment. All the crypto rigs are slow slots to my knowledge since it doesn’t matter there

It is good to see more competitive cards in this space though. Dual 770 could be very accessible

[–] AnomalyNexus@alien.top 1 points 11 months ago

Liking this one - seems particularly good at long form story telling.

NB you may need to update your software...seems to rely on something pretty recent at least for text gen / llama.cpp. Crashed till I updated (and existing copy was max 48hr old)

Also, something odd on the template. Suggested template from the gguf seems to be alpaca while bloke model card says chatml. Under both it seems to spit out <|im_end|> occasionally but chatml seems better overall

[–] AnomalyNexus@alien.top 1 points 11 months ago (3 children)

multi-GPU

That's the question I guess. If you can get say 5x of these for the price of a 4090 then that may look interesting. Though that's a hell of a lot of overhead & hassle on power and pcie slots etc.

[–] AnomalyNexus@alien.top 1 points 11 months ago (1 children)

Athena V4

Think it's aimed at ERP but remarkably pleasant at general upbeat female AI persona.

dolphin.2.1 with possibly a more serious tone.

Tried the yi dolphin one a bit...seems to provide much shorter & curt responses. Def doesn't feel story telling like to me. Maybe the mistral version is better

 

tl;dr: AutoAWQ seems to ignore the multi-GPU VRAM allocation sliders completely in text-generation-ui?!?


I've got a 3090 and added in the old 2070S for some temporary experimentation.

Not particularly stable and slowed speed a lot versus just 3090, but 32gb opens up some higher quant 34Bs.

llama.cpp mostly seems to run fine split across them.

Puzzled though by text-generation-UI's AutoAWQ. Regardless of what I do with the sliders it always runs out of memory on the 8GB card. Even if I tell it 1GB on the 2070S only it still fills it till OOM.. The max the sliders go to are expected amounts (24 & 8) so pretty sure I've got them right way round...

Anybody know what's wrong?

view more: next ›