ShitGobbler69

joined 11 months ago
[–] ShitGobbler69@alien.top 1 points 11 months ago (1 children)

FYI if all you're using it for is benchmarking (not like chat mode) you can probably do it in way less VRAM. You can load 1 layer into VRAM, process the entire set of input tokens, remember that output, load another layer into vram, repeat.