MINIMAN10001

joined 10 months ago
[–] MINIMAN10001@alien.top 1 points 9 months ago (1 children)

It's not that CPUs are slow it's that typically RAM that the CPU is connected to is slow.

That's why unified memory is fast it's just faster and connected to the CPU.

[–] MINIMAN10001@alien.top 1 points 10 months ago

Generally if what you want is to impart new knowledge what you want is a embedding.

Assuming it is a large amount of data you will want a vector db.

Using retrieval augmented generation, RAG.

This is better explained by this guy 16 days ago

https://www.reddit.com/r/LocalLLaMA/comments/17qse19/comment/k8e7fvx/

[–] MINIMAN10001@alien.top 1 points 10 months ago

I understanding is that tokens per second typically splits into two categories the preprocessing time and the actual token generation time.

At least from what I remember from oobabooga

[–] MINIMAN10001@alien.top 1 points 10 months ago

It's a mistral base model which only exists in 7b variety.

Mistral is beloved for having 13b quality in the size and speed of a 7b model.

[–] MINIMAN10001@alien.top 1 points 10 months ago

It was definitely around the point in which the employees said they were ready to bail and Microsoft said they had spaces open that the double back became necessary.

As shareholders who held some opinions so strongly that they fired the CEO has definitely not a position you want to be in.

It's definitely really intriguing to think about someone firing a CEO. Like what could possibly be considered too far when companies like Blizzard actively defend sexual harassment.

Basically means the people have spoken and it's out of the hands of the shareholders and in the hands of the CEO whatever he got fired for.

[–] MINIMAN10001@alien.top 1 points 10 months ago

That's the thing there is no best it depends on your metric and even the leaderboards will only give you specific metrics which are minimally useful for a lot of people where a lot of it just comes down to personal preference and hearsay of what other people have experienced then trying them all yourself

[–] MINIMAN10001@alien.top 1 points 10 months ago

I mean it makes sense The value is chosen we're simply chosen for being a reasonable window at the time.

There was nothing hard coded about them they were simply a range of values that they had set for the UI.

It certainly is interesting though.

[–] MINIMAN10001@alien.top 0 points 10 months ago (1 children)

So assuming this release does anything at all the only thing I can think of would be that instead of "hidden size" cause being 4k giving a 4k sliding window into 32k context it would be a hidden size of 16k giving a 16k window into the 32k context.

However that's just speculation on my part because... Otherwise the release means nothing... Which would be weird.

[–] MINIMAN10001@alien.top 1 points 10 months ago

Spoiler alert MRAM is just RAM but more expensive. You would be better off just buying more RAM.

Ram is just faster storage.

None of that is relevant to LLMS.

[–] MINIMAN10001@alien.top 1 points 10 months ago (1 children)

I believe this is the project I'm remembering as well.

I'm thinking it would be best as a free community project where each player runs their own LLMs in that sort of housing setting with needs to fill food and hunger and places they can go to do so.

Where the AI can interact with each other.

Then you basically spectate this AI house, see if other players get your AI to do interesting things.

Be wary they overwhelming bad behavior would likely exist lol.

Where the server would basically just handle the actual game world state and route LLM messages between players.

A sort of free dynamic player controlled LLM household.