LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

need advice for reducing inference time (alien.top)

submitted 2 years ago by Mundane_Definition_8@alien.top to c/localllama@poweruser.forum

1 comments fedilink hide all child comments

Code

I'm using mistral-7b to understand LLMs' procedure.

Does anyone have an idea to improve this process?

do not recommend changing the number of tokens -> 1. :)

top 1 comments

sorted by: hot top controversial new old

[–] Ok_Post_149@alien.top 1 points 2 years ago

I just wrote a tutorial on how you can scale Mistral-7b to many GPUs in the cloud. I hope this can give you some value. Not sure if you're looking to do on-demand inference or inference on a bunch of inputs.

https://www.reddit.com/r/LocalLLaMA/comments/17k2x62/i_scaled_mistral_7b_to_200_gpus_in_less_than_5/

permalink
fedilink
source