1azytux

joined 1 year ago
 

Hi, as Speculative Decoding runs a small model and a large model at the same time with a sampler in between, but in this instance the sampler's job is to NOT skew the probability distributions while doing so. There's a fairly simple python implementation of this idea here. Is there a way we can adjust the probability distributions of either the small model or the large model for the task of generation?

 

Is there any way we can involve another model (let's call it Model B) to manipulate the logits of Model A? This way, we could incorporate information from Model B when calculating the final outputs of Model A. One way is done by Dexperts paper, but has anyone done it in more straightforward/easier way for LLaMA based model?

 

Why do models behave this way when they’re instructions are fine tuned? As how they start performing better, Is there any study done already?

 

Is anyone aware of how to obtain attention values of LLaMA model? For example, if I want to obtain attention values (of size 4096) from layer 24. How do I get them?

 

Is anyone aware of how to obtain attention values of LLaMA model? For example, if I want to obtain attention values (of size 4096) from layer 24. How do I get them?

[–] 1azytux@alien.top 1 points 1 year ago

you can look at the huggingface page of thebloke where he has told about the differences too

[–] 1azytux@alien.top 1 points 1 year ago (1 children)

this sub is for discussion of important stuff happening in ml not for how candidates can apply for jobs smh