this post was submitted on 21 Nov 2023

1 points (100.0% liked)

LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

Run an openAI powered startup. What’s the best alternative to got 3.5 with function calling that I can run in the cloud? (alien.top)

submitted 2 years ago by fvpv@alien.top to c/localllama@poweruser.forum

18 comments fedilink hide all child comments

Looking for speed and accuracy. Any suggestions on cloud hosts?

top 18 comments

sorted by: hot top controversial new old

[–] yahma@alien.top 1 points 2 years ago (1 children)

None of the open models perform function calling as well as openai...

[–] WAHNFRIEDEN@alien.top 1 points 2 years ago (2 children)

Must combine with grammars

[–] randull@alien.top 1 points 2 years ago

What?

[–] GreatBritishHedgehog@alien.top 1 points 2 years ago

Is there a service like Openrouter that allows you to use grammars?

[–] CircumventThisReddit@alien.top 1 points 2 years ago (1 children)

Write your own parser and integrate function calling with any LLm your heart desires.

[–] _nembery@alien.top 1 points 2 years ago

It’s not even that hard. Just use a regex on the return text for simple classification tasks. Any llama2 can do this reasonably well. The hard part is when you want complex JSON data structures

[–] giesse@alien.top 1 points 2 years ago (2 children)

I'm confused by all the people worrying about OpenAI's API... can't they just use the Azure endpoints? If anything, MS would be very happy to capture all of OpenAI's previous customers...

[–] fvpv@alien.top 1 points 2 years ago

I've now signed up for an Azure endpoint - let's see if it gets approved. It looks like the process to get a key is going to be a bit of a PITA.

[–] jfranzen8705@alien.top 1 points 2 years ago (1 children)

Yeah, they're pretty heavily restricting access to it and prioritizing large-ish enterprise customers.

[–] giesse@alien.top 1 points 2 years ago

I see, OTOH, if OpenAI really went belly up, I imagine they'd rush to increase their own capacity? If anyone wins in all this drama it's Microsoft...

[–] Fast-Satisfaction482@alien.top 1 points 2 years ago (1 children)

From an idealistic point of view, you can implement function calling easily in your team. Use the context free grammar plugins that are now available to ensure that the LLM outputs match your function calling format. Then build your own dataset on your typical workloads and prepare a pipeline to finetune new models on it.
As open-source models will continually improve, you can use that pipeline to fine tune for your task for a few bucks on a few cloud GPUs. You should be prepared to switch from model to model and handle your fine tuning in your team. That way you will be able to keep up with the cutting edge (of open source) and still have full control. You can allways chose that a model is good enough and keep using it forever.

From a serious business point of view: You are in serious trouble because you relied on a single, very hard to replace core service for your whole startup. Don't make that mistake again. First and foremost, make sure that your backend becomes flexible enough to switch the LLM service provide on short notice. Then, you will probably want to integrate support for MS azure's version of GPT3.5. MS appears to have access to all models up to at least GPT4 and moreover appears to have a commercial licence on that. So basically MS provides you with a perfect drop in solution.

You might still want to persue the open-source route, because it gives you full control over your core service. Depending on the size of your startup, you probably should implement at least two separate solutions to the threat of OpenAI shutting down.

Then again, it's entirely possible that OpenAI services will keep operating. The situation is still completely fluid. But I guess MS is your best bet, particularly if the whole team actually migrates to MS.

[–] fvpv@alien.top 1 points 2 years ago

Thank you for this - yes you're right, this is a hard lesson and luckily the stakes are fairly low for me. Had my startup been bigger though there would be pain and panic.

Thank you for pointing me toward the azure 3.5 - I will definitely check that out and that is the kind of solution I am looking for.

[–] ZestyData@alien.top 1 points 2 years ago (2 children)

I don't understand, how do you run a company without providing any value itself, just surfacing OpenAI's existing products, that they'll inevitably sell direct to consumers in the first place?

Particularly if you have to even ask about the one fundamental thing you're supposedly building a company around - using LLMs.

[–] fvpv@alien.top 1 points 2 years ago

I just typed a super long reply and then my browser ate it... damn. I'll summarize what I said:

Provide value by building products that solve customer problems.
The majority of people aren't prompt engineers or coders, and many can't even simply visualize things or know where to start on complex projects.
Use your knowledge to create subject specific products that cater to workflows and formats that need to be specific and include insider knowledge that would take many many prompts to get close to achieving a good outcome.

[–] Slimxshadyx@alien.top 1 points 2 years ago

How do you know the start up isn’t providing value? Isn’t the whole point of making ai is to integrate it with other software/stuff?

Ai can be much more powerful than a chatbot

[–] kpodkanowicz@alien.top 1 points 2 years ago

Guding output was already mentioned but maybe I will mention how this can be done even with very weak model.

You use text complete end point where you will be constructing your prompts.
You specify context and make it stand out as a separate block
Then in a prompt you ask to fill a specific detail (just one to the JSON)
In the completeion part (i.e. after assistant) you already pre-write out put in JSON format with first value,
You stop streaming after " sign
change the prompt to ask for the next value, add it as next atribute to the JSON you are generating and again start generation and stop with "

Very, very fast -you barely generate any tokens mostly eval prompts.

Test manually once you you have good result ask GPT4 to write you a python wrapper to do it.

[–] Crafty-Run-6559@alien.top 1 points 2 years ago

How many users do you have? If you've been keeping your inputs/outputs to gpt4, then you can probably use that to tune a your own model that will perform similarly.

The biggest issue you're going to have is probably hardware.

LLMs are not cheap to run, and if you start needing multiple of them to replace OpenAI, your bill is going to be pretty significant just to keep the models online.

It's also going to be tough to maintain all the infra you'll need without a full time devops/mlops person.

[–] FreezeproofViola@alien.top 1 points 2 years ago

You're not going to get a lower price than the turbo API anywhere sadly

(unless you're dealing with really sensitive data, just use OAI, their machines costs are marked like crazy by sheer scale)