First, I'd like to describe my two possible hardware setup for this problem.
Hardware Setup 1: 14900K + RTX 3060 (8GB VRAM) + 192GB RAM
Hardware Setup 2: 12600K + RTX 4090 (24GB VRAM) + 64GB RAM
The performance requirement of this task is somewhat reasonable, but it is a batch processor, so it doesn't have to be real time.
The problem at hand is trying to use LLMs to fact check or "categorize" snippets of text. What the customer say they want is summarize this snippet of text and tell me what it is about. If anyone knows which kind of model does that well for a setup I described, I'll happily take that as answer.
However, my technical judgement tells me they really want a hot dog or not hot dog machine (silicon valley reference).
90% of the questions they want to ask of a snippet of text is along the following lines:
"Tell me 'truth' if this text that I pasted above is talking about a middle aged woman with arthritis? If it's talking about a man or an older woman with arthritis then tell me false. If it is not talking about a human being with arthritis, tell me n/a"
The ideal classification will be a human female that is middle aged (and we're happy to define that in the context) and not a human male or any other mammal returns true, and it either returns false, or n/a, so a little bit more like hot dog/not hot dog/not food.
What would be a good model for this? The context is typically 2 A4-sized pages of paper that are small font texts.
Today we're using Azure OpenAI and it works very well, but there is a desire to first do a "hot dog or not" so that we don't just send random snippets of text to Azure OpenAI.
Think of this like a first line of defense. If this works well, the local llm setup will be used for psychiatry and sexual topics which are prohibitied in Azure OpenAI.
Money will be an object but I did this recently for consumer hardware. Dark zero motherboard 14900k 192gb ram 4090 t700 crucial ssd