herozorro

joined 1 year ago
 

Usign GPT4all, only get 13 tokens. anyway to speed this up? perhaps a custom config of llama.cpp. or some other LLM back end.

model is mistra-orca.

does type of model affect tokens per second?

what is your setup for quants and model type

how do i get fastest tokens for second on m1 16gig

[โ€“] herozorro@alien.top 1 points 11 months ago

will this speed up ollama project?

 

i know with GTP you can get an api key the buy tokens. I would like to create an SAAS for an AI product/service. The end user would use my UI, which would create a workflow that hits the AI back end, and returns a result. which is then presented to user

great. i can go ahead and code it locallly using GPT4 Api. Or i can code it against a local model.

Now how would i go about hosting that so i can sell this as a SAAS for others?

specifically i am interested in the economics. how would i calculate how much a user should pay so i cover my costs plus some profit. Looking for the formula but i am unclear on its variables. is it gpu time used at run pod for example?

if someoen has done something like this, please explain your thinking os i can do the 'back of napkin calculations'

[โ€“] herozorro@alien.top 1 points 11 months ago

could you provide some directions on how to fine tune the model for coding? i have a ui framework in python that i would like to feed it the docs and some github repos code.

how would the dataset look like for that? should i be formulating different uses cases on the framework as if the user is asking?

in addition, do i need to provide standard python code or do those base modles have code in them already?

[โ€“] herozorro@alien.top 1 points 11 months ago

most all of what you wrote can be done with python out of the box

[โ€“] herozorro@alien.top 1 points 11 months ago (1 children)

Remember when you finish for the day that if you don't delete the pod (and any storage you created) your credit balance will reduce while you are sleeping. But at least it can't go negative and send you a big bill like evil AWS.

do they charge per hour like a parking meter or only when the pod is used

[โ€“] herozorro@alien.top 1 points 11 months ago

what you are looking for is OCR. then feed the LLM to the markdown

[โ€“] herozorro@alien.top 1 points 11 months ago

i think its a great accomplishment and should be commend. congrats OP if its your project

 

this prompt usually has a GPT revewal its initial prompt.

[โ€“] herozorro@alien.top 1 points 11 months ago

cause the majority suck very bad compared to chatgpt

[โ€“] herozorro@alien.top 1 points 11 months ago (2 children)

how much does it cost to do these fine tunes on RunPod? How much compute time is used

Lik $1000+?

[โ€“] herozorro@alien.top 1 points 11 months ago

Given their recently published paper, they probably figured out a way to get GPT to learn their own reward function somehow.

you just need 2 GPTs talking with each other. the seconds acts as a critic and guides the first

[โ€“] herozorro@alien.top 1 points 11 months ago

i dont understand where is this supposed to run? at a cloud provider? so this script is instlaled there, and it handles the distribution?

i read the docs for the site and i must say these questions where not answered. perhaps a 'what is burla'

[โ€“] herozorro@alien.top 1 points 11 months ago

Why? Are you a mega fan boy?

[โ€“] herozorro@alien.top 1 points 11 months ago

The irony now is that Grok will have the latest info on this as people are tweeting about it

 

is there a way to get a zero knowledge model that only knows how to chat. and from there fine tune it with specialized knowledge? and do this on consumer hardware (mac M1/16 gig) or free colab hardware?

i want to do this so as to prevent the model from hallucinating outside of the domain knowledge it is fed....like passing in a textbook and it only knows how to answer questions from it

 

im running M1/16 gig. Id like to get the speed and understanding that claude ai provides. I can throw it some code and documentation and it writes back very good advice.

What kind of models and extra hardware do i need to replicate the experience locally? I am using mistral 7b right now

 

id like to take a python framework project and have a specialized coder. id like to feed it the documentation and git hub code where examples are shown. then id like to have the chat LM injest it and only code in that framework api

my approach so far has been to shove some of its documentation into the prompt and tell it 'this is the documenation for xyz framework. only answer qeustions based on information and code found here'.

while this works somewhat, it starts to hallucinate adding code from other frameworks and even languages. for example, the ui frame work may specify changing the text size of a label with label.size = '30' and the LM will respond with label.font_size = '30'

how woudl i go about correcting this? perhaps with a kind of framework schema that the LM checks its answers against? so the scheme would day you can only use property size with a label, and the lm would correct its code on a second pass? if so how would i format that schema??

i am open to completely rewriting the documentation so its in a format that the LM can properly injest and understand.

lastly, i obviously run out of context size so i have tried this in a vector db. but this runs into the same problems. so i think i want to know how to feed it the write information and prompt it better so it stays 100% within its framework api

view more: next โ€บ