overview for LyPreto

New Model: Starling-LM-11B-alpha-v1 in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago (4 children)

I saw their 7B model closing in on gpt-4 scores in some benchmarks which is absolutely wild but also sus

struggling to include text prompts along with image-data (multimodal) for inferencing in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago

I ended up just scrutinizing the server code to understand it better and found that the prompt needs to follow a very specific format or else it won't work well:

prompt: \A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\nUSER:[img-12]${message}\nASSISTANT:``

Is there any sort of project that is combines Text + Image + TTSVoice generation in one single UI ? in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago

you have all the APIs whats stopping you from putting something like this together? personally for me the only challenge is finding projects compatible with M1 that offer Metal offloading— but for linux it should be relatively straightforward to implement

1

struggling to include text prompts along with image-data (multimodal) for inferencing (alien.top)

submitted 2 years ago by LyPreto@alien.top to c/localllama@poweruser.forum

2 comments fedilink

I spun up a simple project (home surveillance system) to play around with ShareGPT4V-7B and made quite a bit of progress over the last few days. However, I'm having a really hard time figuring out how to send a simple prompt along with the image-to-text request. Here is the relevant code:

document.getElementById('send-chat').addEventListener('click', async () => {  const       

  message = document.getElementById('chat-input').value;
  appendUserMessage(message);
  document.getElementById('chat-input').value = '';
  const imageElement = document.getElementById('frame-display');
  const imageUrl = imageElement.style.backgroundImage.slice(5, -2);

  try {
    const imageBlob = await fetch(imageUrl).then(res => res.blob());
    const reader = new FileReader();
    reader.onloadend = async () => {
    const base64data = reader.result.split(',')[1];

    const imageData = {
      data: base64data,
      id: 1
    };

    const payload = {
      prompt: message,
      image_data: [imageData],
      n_predict: 256,
      top_p: 0.5,
      temp: 0.2
    };

    const response = await fetch("http://localhost:8080/completion", {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(payload)
    });

    const data = await response.json();
    console.log(data);
    appendAiResponse(data.content);
  };
  reader.readAsDataURL(imageBlob);
  } catch (error) {
    console.error('Error encoding image or sending request:', error);
  }
});

The only thing that works is sending an empty space or sometimes a question mark and i'll get a general interpretation of the image but what I really want is to be able to instruct the model so it knows what to look for. Is that something that's currently possible? basically system prompting the vision model.

[P] X—LLM: Few lines of code to train your own 7B LLM in Colab using cutting edge techniques like QLoRA in c/machinelearning@academy.garden

[–] LyPreto@alien.top 1 points 2 years ago (1 children)

not prefer it bur recognize its user base— metal + the unified memory have a lot to offer and the compute is there.. there just rly no adoption other than a few select projects like llama.cpp and some of the other text-inferencing engines.

[P] X—LLM: Few lines of code to train your own 7B LLM in Colab using cutting edge techniques like QLoRA in c/machinelearning@academy.garden

[–] LyPreto@alien.top 1 points 2 years ago

https://pytorch.org/docs/stable/notes/mps.html

[P] X—LLM: Few lines of code to train your own 7B LLM in Colab using cutting edge techniques like QLoRA in c/machinelearning@academy.garden

[–] LyPreto@alien.top 1 points 2 years ago (5 children)

I rly wish MPS was more widely adopted by now… hate seeing just CUDA or CPU in all these new libraries

oss tts engine? in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago (1 children)

tried coqui and had issues with performance— read online and its doesnt seem to fully support MPS.

for now i’m using upon edge-tts which is doing the trick for now and is pretty decent/free.

is xtts supported on macs?

oss tts engine? in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago (1 children)

Update: quick reddit search (which i should've done prior to posting tbh) led me to this post: ai_voicechat_script

1

oss tts engine? (alien.top)

submitted 2 years ago by LyPreto@alien.top to c/localllama@poweruser.forum

6 comments fedilink

So, i've been doing all my LLM-tinkering on an M1-- using llama.cpp/whisper.cpp for to run a basic voice powered assistant, nothing new at this point.
Currently adding a visual component to it-- ShareGPT4V-7B, assuming I manage to convert to gguf. Once thats done i should be able to integrate it with llama.cpp and wire it to a live camera feed-- giving it eyes.
Might even get crazy and throw in a low level component to handle basic object detection, letting the model know when something is being "shown" to the to it-- other than that it will activate when prompted to do so (text or voice).

The one thing I'm not sure about is how to run a TTS engine locally like StyleTTS2-LJSpeech? are there libraries that support tts models?

Any alternatives to couqi for TTS? in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago

the licensing on this blows but they have a very unique model IMO: StyleTTS

it picks up the appropriate voice/intonation according to the text which i personally haven’t seen being done yet!

Best choise to continue AI coding without ChatGPT Plus? in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago

claude is dogshit for code generation from my experience

Best choise to continue AI coding without ChatGPT Plus? in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago (1 children)

use LM Studio

What UI do you use and why? in c/localllama@poweruser.forum

[–] LyPreto@alien.top 1 points 2 years ago (1 children)

damn llama.cpp has a monopoly indirectly 😂

1

is there any ongoing effort to "bake-in" vision capabilities on top of base models or fine-tunes? (alien.top)

submitted 2 years ago by LyPreto@alien.top to c/localllama@poweruser.forum

2 comments fedilink

have been thinking about this for a while-- does anyone know how feasible this is? Basically just applying some sort of "LoRa" on top of models to give them vision capabilities-- making then multimodal.