I agree with finetuning + RAG, given that OP already seems to have Q&A pairs, so it should be a great starting point as a dataset.
The language (Dutch <-> English) could possibly be a barrier for reasonable performance with Llama or any other 7B model, but as OP stated they might be able to use translation for that. I'm not sure whether DeepL could be used for that, i.e., using the DeepL API as a wrapper around the code for user input and chatbot output. It should have pretty good perfomance for Dutch. I like the idea and would like to test this or see the results when properly implemented. So please keep us updated on your approach u/Flo501
According to their news page: "early next year"