If you're asking that question here, you ma not be qualified for the job.
pm_me_your_pay_slips
joined 1 year ago
Care to write a clear explanation of the method here?
Join a startup to work on these things. You'll very quickly realize why people are still pursuing PhDs in the field.
This is gibberish. Was this a paper written by ChatGPT?
Use a transformer layer for aggregation if you want a learnable way of pooling them. Positional encoding and masking should help you with ensuring that order influences the prediction.
A100s and H100s are great for training, but a bit of a waste for inference.
it's very likely something like this: https://arxiv.org/pdf/2305.18290.pdf
Or finetuning on high quality datasets