Per the tokenizer.config for mistral instruct model the eos is . You can use the same. If you check the tokenizer file for the instruct base model, the is defined as a special token. So it will work fine for eos. Reg padding, the reason you define the padding is so that all your batches are of same fixed length during tuning. Define your dataset with to start and use to eos and pad to right.
Btw, why are you fine tuning the base model for text to sql? Won’t it be better to fine tune the instruct model for this? You can use the same prompt template as the instruct model uses. Good luck and let me know how it goes.