I don't really follow your logic, how else would you propose to shape the audio that is not "just an effect".
Your analogy to real life does not take into account that the audio source itself is moving, so their is an extra variable outside of just stereo signal -which is what spatial audio is modelling
And your muffling example sounds a bit over simplified maybe? My understanding is that the spatial stuff is produced by phase shifting the LR signals slightly
Finally why not go further? "I don't listen to speaker audio because it's all just effects and mirages to sound like a real sound, what only 2^16 discrete positions the diaphragm can be in" :p
Devils advocate: Splatting, dlss, neural codecs to name a few things that will change the way we make games