Maykey

joined 1 year ago
[–] Maykey@alien.top 1 points 11 months ago

This "only memory saved" amounts to throwing away 2 copies of the entire model. Pretty sweet deal.

[–] Maykey@alien.top 1 points 11 months ago

I haven't watch the talk, but I think the reading list should have some love for SSM. (S4, S5, H3): on one hand their variants are very prominent on long range arena on other they are relatively "unknown".

They are not unknown to researchers seeing how many variants there are, but there are hundreds more videos and blogs explaining transformers. If you find a course about LLM, it will likely include Transformers but not SSM, so I think their success in LRA and absence in learning materials qualifies them for "dive in deeper" list.

[–] Maykey@alien.top 1 points 11 months ago

I don't think a linear transformer has a serious chance to beat a standard transformer with the same number of parameters.

I do. Transformers are not good on long range area.. They perform well only if they are backed by better architectures as in case of MEGA.

[–] Maykey@alien.top 1 points 11 months ago

I like to imagine one research in terms of another. For example I see Luna as a cousin of RMT (core idea of both is to get smaller sequence from a bigger, but methods and goals are very different), but if you squint, you will see the similarities. Helps with breaking down whole paper to smaller parts and see how one research is different from another and how they are similar. And I reward myself with a cookie if I find similarities when papers do not mention each other. I also have a (paper) notebook where I write down notes

Disclaimer, I'm not student/researcher, but a dirty hobbyist

[–] Maykey@alien.top 1 points 11 months ago

Yeah, it just needs more integration of commands with llm(/go east vs east vs map actual exits or /take with take) because now it's confusing what can be done in game actually and what is hallucination which doesn't change real game state that much

[–] Maykey@alien.top 1 points 11 months ago (2 children)

https://chasm.run/worlds

Well, there's only one port there.

Also it seems LLM is not good at paring /map and look:

> look In addition to the main path leading deeper into the Vastarium, there are exits to the south and west...

> /map
Vastarium Entrance Hall, Dining Area, Bar, Restrooms
Uncanny Valley, to the northwest
Bizarre Botanical Garden, to the southeast
Labyrinthine Library, to the northeast

Also bottom part is confusing:

• • Hhgg | Vastarium | 7 -2 | 47 turns | 10

  • Nyaran | Uncover the truth behind the ancient artifacts
    

Not sure who is Hhgg (maybe me and there was character generation but I forgot it already), 7 -2 is map coordinates, 47 turns is history length, 10 is ???, "Uncover the truth..." constantly disappears so I'm not sure what it is, if it's local area quest, global quest (there is no /quests commands afaict)

[–] Maykey@alien.top 1 points 11 months ago (5 children)

Documentation is wrong. It says chasm_server = "chasm.run:1234". Program wants server = "tcp://chasm.run:25566"

Then client crashed when I changed key only. I had to add "tcp://" to the server. Then I got to the banner, it said "type /help", I typed, nothing happened. I dunno if this is another instance "wouldn't happen in telnet" or server is ovlerloaded.

[–] Maykey@alien.top 1 points 11 months ago (7 children)

As avid mud player in the past, why do I need a client? What's wrong with telnet? Installing client (into venv on top of it) is extremely wasteful for sending text messages to the remote server and getting them back.

[–] Maykey@alien.top 1 points 11 months ago

Yes. ExLlama2 is much faster IME. It also supports 8-bit cache to save even more VRAM(I don't know if llama.cpp has it).

[–] Maykey@alien.top 1 points 11 months ago

My hot take is that local models will become truly feasible on phones(and in general) only once we move past transformers towards something more FLOP and memory efficient(RetNet, S5)

[–] Maykey@alien.top 1 points 11 months ago

At current capabilities it's faster to query server on the opposite hemisphere than to generate locally.

[–] Maykey@alien.top 1 points 11 months ago

I hope openai becomes more open under new leadership, but I am not holding my breath

view more: next ›