this post was submitted on 27 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

Paper: https://arxiv.org/abs/2311.11829

Abstract:

Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.

โ€‹

you are viewing a single comment's thread
view the rest of the comments
[โ€“] reverendCappuccino@alien.top 1 points 11 months ago

Well, it's more like a psychological term, and attention is already there to illustrate the intended meaning of a dot product. The analogy holds up, so why doubting the validity of using system 2 attention rather than that of using attention at all?