You may have to tell it to build with cublas AND force the reinstall:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
otherwise the build may be going 'it's already done' and the install may be reinstalling the non-gpu version
Also, no idea about WSL, I've only tried this on actual linux installs.
There is no doubt current gen games will be made way more awesome with AI, but I think the real power of LLM's is opening up whole new types of game. Current games are (largely) static universes and static stories, with simple things like physics simulations (bullets, destruction). These are all things that are easy to program using traditional techniques.
LLM's will allow new genre's: spying simulators where control of information flow has a significant impact. Political games where you actually have to negotiate and convince NPC's of your actions. Managerial games with subordinates who can operate independently. RTS's where you operate as a general because AI's handle passing battlefront information up the command structure. And dozens probably more I haven't yet thought of....
Many board games rely on social aspects. These can now be made into single player games.