AsliReddington

joined 1 year ago
[–] AsliReddington@alien.top 1 points 11 months ago

By that logic every LLM put there will engage in talk about Xi

[–] AsliReddington@alien.top 1 points 1 year ago

Yeah man just use langchain+pydantic class/guidance lib by MS with Mistral Instruct or Zephyr & you're golden

[–] AsliReddington@alien.top 1 points 1 year ago

All you need a 32K LLM. Everything beyond that needs a tool invocation where the archived texts can be pulled from. You'll have to make your orchestrator smart enough to know that there is content beyond just needs to be invoked

[–] AsliReddington@alien.top 1 points 1 year ago

Just run on TGI or vLLM for flash attention & continuous batching for parallel requests

[–] AsliReddington@alien.top 1 points 1 year ago (1 children)

It's extremely overpriced. With INT4 llama.cpp does even crazier numbers. A system with 4090s can be made for $2500 in India & cheaper elsewhere for sure.

[–] AsliReddington@alien.top 1 points 1 year ago (1 children)

I feel like verifiable math & physics simulation should be something which every LLM should just invoke as a tool instead of trying to do it within slowly

[–] AsliReddington@alien.top 1 points 1 year ago (1 children)

vLLM, TGI, Tensort-LLM

Fuyu-8b