overview for fakezeta

Optimum Intel OpenVino Performance in c/localllama@poweruser.forum

[–] fakezeta@alien.top 1 points 9 months ago (1 children)

I hope that something similar emerge on Linux.

SYCL can be a candidate, like Vulkan for 3D Acceleration: it's a PITA to deal with CUDA, ROCm etc etc.

1

Optimum Intel OpenVino Performance (alien.top)

submitted 9 months ago by fakezeta@alien.top to c/localllama@poweruser.forum

4 comments fedilink

Optimum Intel int4 on iGPU UHD 770

I'd like to share the result of inference using Optimum Intel library with Starling-LM-7B Chat model quantized to int4 (NNCF) on iGPU Intel UHD Graphics 770 (i5 12600) with OpenVINO library.

I think it's quite good 16 tk/s with CPU load 25-30%. Same performance with int8 (NNCF) quantization.

This is inside a Proxmox VM with SR-IOV virtualized GPU 16GB RAM and 6 cores. I also found that the ballooning device might cause crash of the VM so I disabled it while the swap is on a zram device.

free -h output while inferencing:

total used free shared buff/cache available

Mem: 15Gi 6.2Gi 573Mi 4.7Gi 13Gi 9.3Gi

Swap: 31Gi 256Ki 31Gi

Code adapted from https://github.com/OpenVINO-dev-contest/llama2.openvino

What's your thoughts on this?

Neural-chat-7b-v3-1 GGUF. New Mistral finetune in c/localllama@poweruser.forum