overview for AsliReddington

Hugging Face Removes Singing AI Models of Xi Jinping But Not of Biden in c/localllama@poweruser.forum

[–] AsliReddington@alien.top 1 points 11 months ago

By that logic every LLM put there will engage in talk about Xi

Is there a 7B model capable to extract keywords from a text and return them as an array? in c/localllama@poweruser.forum

[–] AsliReddington@alien.top 1 points 1 year ago

Yeah man just use langchain+pydantic class/guidance lib by MS with Mistral Instruct or Zephyr & you're golden

Vector DB alternatives: storage optimization of past conversations in LLMs, anyone ever done it? in c/localllama@poweruser.forum

[–] AsliReddington@alien.top 1 points 1 year ago

All you need a 32K LLM. Everything beyond that needs a tool invocation where the archived texts can be pulled from. You'll have to make your orchestrator smart enough to know that there is content beyond just needs to be invoked

Any tricks to speed up 13B models on a 3090? in c/localllama@poweruser.forum

[–] AsliReddington@alien.top 1 points 1 year ago

Just run on TGI or vLLM for flash attention & continuous batching for parallel requests

LLM's in production hardware requirements in c/localllama@poweruser.forum

[–] AsliReddington@alien.top 1 points 1 year ago (1 children)

It's extremely overpriced. With INT4 llama.cpp does even crazier numbers. A system with 4090s can be made for $2500 in India & cheaper elsewhere for sure.

LLMs and basic maths in c/localllama@poweruser.forum

[–] AsliReddington@alien.top 1 points 1 year ago (1 children)

I feel like verifiable math & physics simulation should be something which every LLM should just invoke as a tool instead of trying to do it within slowly

What are top open source projects in LLM space in c/localllama@poweruser.forum

[–] AsliReddington@alien.top 1 points 1 year ago (1 children)

vLLM, TGI, Tensort-LLM

Fuyu-8b