There are two things I'm talking about here. One is that I think the warring audio factions might be talking about two very different things (although the FR ppl seem to think there's only one thing?). The other is which one I think is more important. It's a wall of words, and in the end I'm not sure if I truly understand it myself so I'm probably gonna get torn to shreds for suggesting it.
I probably should use the word "timing" instead of "time domain"
I think I personally value the timing realm more than the frequency (pitch) realm. The audio engineers are right... you can only discern so much in terms of pitch. It's 20 - 20,000 and even that's generous considering 16,000 is already the limit for lots of older listeners. They're also right that there are psychoacoustic things about sound. BUT I wonder if they forget about the timing when it comes to audio, because from what I can tell all 'measurements' when it comes to audio, are related to the Frequency Response (pitch) and not timing. A visual equivalent might be Audio is Color Spectrum and timing is "Frames Per Second".
Maybe all the in-fighting over the topic is this mis-understanding? On the one side you have the equivalent of FR people focusing on the 'color reproduction' saying "You can't even see Infrared light!" or "If you adjust the color, then the two pictures are exactly the same". But then team "timing" is talking about resolution and motion fidelity, not necessarily color reproduction.
For example. How do we determine the location of sounds? The difference in timing between when audio reaches the left and right ears. It can be as low as 10 microseconds according to this article:
https://www.sciencefocus.com/science/why-is-there-left-and-right-on-headphones
Another article mentions that humans can detect even less than 10 microseconds (3 - 5 microseconds?) of timing difference:
https://phys.org/news/2013-02-human-fourier-uncertainty-principle.html
So many things can be explained by this. Spatial Cues like staging and imaging. Transients and Textures depend on the speed of changes in frequency, not the frequencies themselves. I think those same things help in determining how detailed & resolving things seem and relate to micro and macro dynamics. It's known that if you compare a piano note to a guitar note... it's the brief attack characteristics, the pluck vs the hammer, that clue us into which sound comes from which instrument. I think all of the "life-like" things are mostly in timing dependent vs frequency or pitch.
From what I can tell... the things that make Hi-Fi gear stand out from just the cheapest gear with good EQ applied, are tied to the timing. I've been lucky enough to go to a Can Jam before and listened to very expensive things and everything below in terms of price. To my ears, there IS a difference and it didn't matter what the price tag said, I wasn't gonna buy the expensive stuff anyways... I just wanted to hear the differences for myself.
I've listened to things that "measure perfectly", like the near perfect Dan Clark Stealth and Dan Clark Expanse. DC uses meta materials to help dampen and "shape" the sound and coincidently measure nearly perfect to the Harman Curve. I've listened to many Chi-Fi DACs and AMPs that also measure perfectly (they all use mounds of negative feedback). And to my ears, those are some of the most boring and life-less things to listen to.
** So in my opinion, faithful reproduction of Frequency is NOT the holy grail. You can EQ things anyway you like and I agree that EQ is excellent! It changes the sound more than most things. But good FR performance is cheap in my opinion and that's great. What's not widely available are things that perform well in the timing. From what I can tell, that's what people pay up for.
I'd be interested to see if one day the industry starts creating ways to measure time-domain performance. In my analogy above I use the metaphor of "Frames Per Second", but timing changes can also be represented in Hz. In the first article, Humans can use timing cues as small as 10 microseconds (μs) which equates to 100,000 Hz in order to position a sound source. In the second article, Humans can detect changes as small as 3 μs. The article mentions 13x to 10x better time difference detection than expected so if 3 μs is on the extreme 13x side that means the other participants were closer to 4 μs or the 10x figure. Going by the 4 μs figure, that would equate to 250,000 Hz resolution. It's not about pitch, it's about changes in the audio.
By the assymmetry of your ears. Sound waves get diffracted and scattered differently when they are coming from different directions, front or back, top or bottom, etc. And you learn to distinguish between them by using the same pair of ears.
I mean I get that part… but how does it come from different directions when it’s one flat driver, it has mostly a center and a surrounding area - how does a driver reproduce that