I would be very wary of such an application. There is no current model which does not hallucinates at times. And certainly if you are asking for a factual analysis.
I am using an llm to extract data from some texts, but for every answer it gives I do a simple search through the input text to first see if the text exists in the input text. Because if it does not then it cannot be true. If it exists it doesn’t mean it is correct or anything like that. And that simple check goes wrong on a finetuned model about 1 in a 100 answers.
Or look at the hg leaderboards, if it says 98% on a test then it basically says even after special training on known data it still has 2% percent wrong and now you want to throw unknown data with an unknown question at it.
Sometimes it will return rubbish which you can filter out but sometimes it will just output 23 instead of 22 which was in your input text ( or there was 23 for a different fact in your input text) and these are very hard to filter out and they don’t matter with most applications. But if you want to produce analyzes or facts than these are simply wrong