this post was submitted on 24 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Hey guys,

I'm running the quantized version of mistral-7B-instruct and its pretty fast and accurate for my use case. On my PC I'm generating approximately 4 tokens per second with the idea of generating one-sentence responses for my NPC characters, which is good enough for what I need.

After fiddling around with oobabooga a bit I found out that you can perform API calls on localhost and print out the text, which is exactly what I need for this to work.

The issue I'm running into here is that if I were to make a game with AI-generated content, how can I make it easy for players to run their own localhost and perform api calls in the game this way? I feel like for the unexperienced, setting all this up would be a nightmare for them and I don't want to alienate non-tech savvy players.

you are viewing a single comment's thread
view the rest of the comments
[–] DarthNebo@alien.top 1 points 10 months ago

The fastest way would be to ingest the ggerganov server.cpp module & make HTTP calls to it. Way easier to package into other apps & supports parallel decoding with 30tok/s on Apple Silicon(M1 Pro)