Severin_Suveren

joined 1 year ago
 

Today we run a query, the LLM answers. The evolution of this was going multi-modal, and being able to input or output other forms of data, like video and sound

But still, we query a model, the LLM answers. Same with agents, only then we have multiple queries working in a system, often assisted by function calling and automation in order to put thought into action

We do it this way because it's the most obvious solution. The easiest one to implement, or rather the only one we can implement at this time

At this time

In the future we will get more efficient models. Models that can output in the orders of 100+ t/s. Models that can run so fast that they're essentially able to simulate continuous runtime

And that's not all. With a model this fast you could also implement multi-query systems which runs backend to handle any agent related tasks or even for assisting in reasoning by having lightning fast discussions with the frontend before giving any outputs

tl;dr LLMs are just waiting for us to give them more juice so that they're fast enough to simulate continuous runtime

[–] Severin_Suveren@alien.top 1 points 1 year ago (1 children)

Would love it if instead of proving LLMs are concious, we prove that none of us are. Or, I guess, I wouldn't be since I wouldn't be concious

[–] Severin_Suveren@alien.top 1 points 1 year ago

You will need to feed the model with the conversation log every time you query it, and as such you'd be limited by the context length on the model.

With a 100k context model you'd be able to keep a chat log of about 70-100 000 words, which is about the length of a normal book.