I wonder if you can pass a large dataset of prompts to perform a certain relatively narrow task and see which neurons get activated. And then use statistical measures to add a few surrounding neurons just in case.
Bet you can get away with near zero reduction in size and massive parameter compression.
A6000 being worse than 3090 doesn’t make any sense.