this post was submitted on 21 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Looking for speed and accuracy. Any suggestions on cloud hosts?

top 18 comments
sorted by: hot top controversial new old
[–] yahma@alien.top 1 points 10 months ago (1 children)

None of the open models perform function calling as well as openai...

[–] WAHNFRIEDEN@alien.top 1 points 10 months ago (2 children)

Must combine with grammars

[–] randull@alien.top 1 points 10 months ago
[–] GreatBritishHedgehog@alien.top 1 points 10 months ago

Is there a service like Openrouter that allows you to use grammars?

[–] CircumventThisReddit@alien.top 1 points 10 months ago (1 children)

Write your own parser and integrate function calling with any LLm your heart desires.

[–] _nembery@alien.top 1 points 10 months ago

It’s not even that hard. Just use a regex on the return text for simple classification tasks. Any llama2 can do this reasonably well. The hard part is when you want complex JSON data structures

[–] giesse@alien.top 1 points 10 months ago (2 children)

I'm confused by all the people worrying about OpenAI's API... can't they just use the Azure endpoints? If anything, MS would be very happy to capture all of OpenAI's previous customers...

[–] fvpv@alien.top 1 points 10 months ago

I've now signed up for an Azure endpoint - let's see if it gets approved. It looks like the process to get a key is going to be a bit of a PITA.

[–] jfranzen8705@alien.top 1 points 10 months ago (1 children)

Yeah, they're pretty heavily restricting access to it and prioritizing large-ish enterprise customers.

[–] giesse@alien.top 1 points 10 months ago

I see, OTOH, if OpenAI really went belly up, I imagine they'd rush to increase their own capacity? If anyone wins in all this drama it's Microsoft...

[–] Fast-Satisfaction482@alien.top 1 points 10 months ago (1 children)

From an idealistic point of view, you can implement function calling easily in your team. Use the context free grammar plugins that are now available to ensure that the LLM outputs match your function calling format. Then build your own dataset on your typical workloads and prepare a pipeline to finetune new models on it.
As open-source models will continually improve, you can use that pipeline to fine tune for your task for a few bucks on a few cloud GPUs. You should be prepared to switch from model to model and handle your fine tuning in your team. That way you will be able to keep up with the cutting edge (of open source) and still have full control. You can allways chose that a model is good enough and keep using it forever.

From a serious business point of view: You are in serious trouble because you relied on a single, very hard to replace core service for your whole startup. Don't make that mistake again. First and foremost, make sure that your backend becomes flexible enough to switch the LLM service provide on short notice. Then, you will probably want to integrate support for MS azure's version of GPT3.5. MS appears to have access to all models up to at least GPT4 and moreover appears to have a commercial licence on that. So basically MS provides you with a perfect drop in solution.

You might still want to persue the open-source route, because it gives you full control over your core service. Depending on the size of your startup, you probably should implement at least two separate solutions to the threat of OpenAI shutting down.

Then again, it's entirely possible that OpenAI services will keep operating. The situation is still completely fluid. But I guess MS is your best bet, particularly if the whole team actually migrates to MS.

[–] fvpv@alien.top 1 points 10 months ago

Thank you for this - yes you're right, this is a hard lesson and luckily the stakes are fairly low for me. Had my startup been bigger though there would be pain and panic.

Thank you for pointing me toward the azure 3.5 - I will definitely check that out and that is the kind of solution I am looking for.

[–] ZestyData@alien.top 1 points 10 months ago (2 children)

I don't understand, how do you run a company without providing any value itself, just surfacing OpenAI's existing products, that they'll inevitably sell direct to consumers in the first place?

Particularly if you have to even ask about the one fundamental thing you're supposedly building a company around - using LLMs.

[–] fvpv@alien.top 1 points 10 months ago

I just typed a super long reply and then my browser ate it... damn. I'll summarize what I said:

  1. Provide value by building products that solve customer problems.

  2. The majority of people aren't prompt engineers or coders, and many can't even simply visualize things or know where to start on complex projects.

  3. Use your knowledge to create subject specific products that cater to workflows and formats that need to be specific and include insider knowledge that would take many many prompts to get close to achieving a good outcome.

[–] Slimxshadyx@alien.top 1 points 10 months ago

How do you know the start up isn’t providing value? Isn’t the whole point of making ai is to integrate it with other software/stuff?

Ai can be much more powerful than a chatbot

[–] kpodkanowicz@alien.top 1 points 10 months ago

Guding output was already mentioned but maybe I will mention how this can be done even with very weak model.

You use text complete end point where you will be constructing your prompts.
You specify context and make it stand out as a separate block
Then in a prompt you ask to fill a specific detail (just one to the JSON)
In the completeion part (i.e. after assistant) you already pre-write out put in JSON format with first value,
You stop streaming after " sign
change the prompt to ask for the next value, add it as next atribute to the JSON you are generating and again start generation and stop with "

Very, very fast -you barely generate any tokens mostly eval prompts.

Test manually once you you have good result ask GPT4 to write you a python wrapper to do it.

[–] Crafty-Run-6559@alien.top 1 points 10 months ago

How many users do you have? If you've been keeping your inputs/outputs to gpt4, then you can probably use that to tune a your own model that will perform similarly.

The biggest issue you're going to have is probably hardware.

LLMs are not cheap to run, and if you start needing multiple of them to replace OpenAI, your bill is going to be pretty significant just to keep the models online.

It's also going to be tough to maintain all the infra you'll need without a full time devops/mlops person.

[–] FreezeproofViola@alien.top 1 points 10 months ago

You're not going to get a lower price than the turbo API anywhere sadly

(unless you're dealing with really sensitive data, just use OAI, their machines costs are marked like crazy by sheer scale)