LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Questions on Attention Sinks and Their Usage in LLM Models (alien.top)

submitted 2 years ago by Holiday_Fly_590@alien.top to c/localllama@poweruser.forum

5 comments fedilink hide all child comments

top 5 comments

sorted by: hot top controversial new old

[–] dqUu3QlS@alien.top 1 points 2 years ago (1 children)

What's the question?

[–] esotericloop@alien.top 1 points 2 years ago

See, you're attending to the initial token across all layers and heads. :P

[–] Tiny_Nobody6@alien.top 1 points 2 years ago

IYH kindly post the paper link

[–] Knopty@alien.top 1 points 2 years ago (1 children)

If you are wondering if it could be implemented, there was a modified transformers library. The author practically made changes, renamed the library to attention_sinks and presented it as a drop-in solution to use it:

https://github.com/tomaarsen/attention_sinks/

But it was impossible to maintain, so devs of transformers suggested him to make a patch for transformers and to maintain it, so it could be properly incorporated into the library and to be future-proof.

The author of this code has been working on this patch since beginning of the October:

https://github.com/huggingface/transformers/pull/26681

[–] WAHNFRIEDEN@alien.top 1 points 2 years ago

it's already implemented in llama.cpp