this post was submitted on 12 Nov 2023
1 points (100.0% liked)

LocalLLaMA

4 readers
4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago
MODERATORS
 

Use case is that I want to create a service based on Mistral 7b that will server an internal office of 8-10 users.

I’ve been looking at modal.com, and runpod. Are there any other recommendations?

top 15 comments
sorted by: hot top controversial new old
[–] Belnak@alien.top 1 points 2 years ago
[–] navrajchohan@alien.top 1 points 2 years ago
[–] sshh12@alien.top 1 points 2 years ago

Huge fan of modal, have been using them for a couple serverless LLM and Diffusion models. Can be definitely on the costly side, but like that the cost directly scales based on requests and setup is trivial.

recent project with modal: https://github.com/sshh12/llm-chat-web-ui/tree/main/modal

[–] Ok-Goal@alien.top 1 points 2 years ago

In our internal lab office, we're using https://ollama.ai/ with https://github.com/ollama-webui/ollama-webui to locally host LLMs, docker compose provided by ollama-webui team worked like a charm for us.

[–] clxyder@alien.top 1 points 2 years ago (1 children)

Do you have hardware to serve the API or do you want to run this from the cloud?

[–] decruz007@alien.top 1 points 2 years ago

Looking at cloud as an option. Don’t really have hardware now.

[–] dazld@alien.top 1 points 2 years ago

Did you think about running out of a local m1 Mac mini? Ollama uses the Mac GPU out of the box.

[–] m0dE@alien.top 1 points 2 years ago

fullmetal.ai

[–] apepkuss@alien.top 1 points 2 years ago (1 children)

WebAssembly based open source LLM inference (API service and local hosting): https://github.com/second-state/llama-utils

[–] RustyLanguage@alien.top 1 points 2 years ago

hmm cool. seems the size for the inference app only a few MBs

[–] ImNewHereBoys@alien.top 1 points 2 years ago (1 children)

Just curious. What are you using it for?

[–] decruz007@alien.top 1 points 2 years ago

Knowledge base, general GPT use, interaction with our CMS to add or update data.

[–] carlosglz11@alien.top 1 points 2 years ago

Let us know what you end up going with op! I’m interested in something like this as well…

[–] DreamGenX@alien.top 1 points 2 years ago

I can recommend vLLM. Also offers OpenAI compatible API service, if you want that.

[–] openLLM4All@alien.top 1 points 2 years ago

I noticed TheBloke was using Massed Compute to quantize models. I've been poking around and using their hardware a bit more