this post was submitted on 25 Nov 2023
1 points (100.0% liked)

LocalLLaMA

1 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 10 months ago
MODERATORS
 

Title essentially. I'm currently running RTX 3060 with 12GB of VRAM, 32GB RAM and an i5-9600k. Been running 7B and 13B models effortlessly via KoboldCPP(i tend to offload all 35 layers to GPU for 7Bs, and 40 for 13Bs) + SillyTavern for role playing purposes, but slowdown becomes noticeable at higher context with 13Bs(Not too bad so i deal with it). Is this setup capable of running bigger models like 20B or potentially even 34B?

you are viewing a single comment's thread
view the rest of the comments
[โ€“] FullOf_Bad_Ideas@alien.top 1 points 10 months ago (1 children)

Isn't cublas specific to Nvidia cards and clBLAST compatible with both Nvidia and AMD? I am not sure how cublas could work with AMD cards, ROCm?

[โ€“] flurbz@alien.top 1 points 10 months ago

You're right, this shouldn't work. But for some strange reason, using --usecublas loads the hipblas library:

Welcome to KoboldCpp - Version 1.49.yr1-ROCm
Attempting to use hipBLAS library for faster prompt ingestion. A compatible AMD GPU will be required.
Initializing dynamic library: koboldcpp_hipblas.so

I have no idea why this works but it does and since the 6700XT took quite a bit of effort to get going, i'm keeping it this way.