As the title suggests, basically i have a few LLM models and wanted to see how they perform with different hardware (Cpus only instances, gpus - t4, v100, a100).
Ideally it's to get an idea on the performance and overall price(vm hourly rate/ efficiency)
Currently I've written a script to calculate ms per token, ram usage(memory profiler), total time taken.
Wanted to check if there are better methods or tools. Thanks!
Oh nice, thanks!
https://github.com/paperless-ngx/paperless-ngx