There are two things I'm talking about here. One is that I think the warring audio factions might be talking about two very different things (although the FR ppl seem to think there's only one thing?). The other is which one I think is more important. It's a wall of words, and in the end I'm not sure if I truly understand it myself so I'm probably gonna get torn to shreds for suggesting it.
I probably should use the word "timing" instead of "time domain"
I think I personally value the timing realm more than the frequency (pitch) realm. The audio engineers are right... you can only discern so much in terms of pitch. It's 20 - 20,000 and even that's generous considering 16,000 is already the limit for lots of older listeners. They're also right that there are psychoacoustic things about sound. BUT I wonder if they forget about the timing when it comes to audio, because from what I can tell all 'measurements' when it comes to audio, are related to the Frequency Response (pitch) and not timing. A visual equivalent might be Audio is Color Spectrum and timing is "Frames Per Second".
Maybe all the in-fighting over the topic is this mis-understanding? On the one side you have the equivalent of FR people focusing on the 'color reproduction' saying "You can't even see Infrared light!" or "If you adjust the color, then the two pictures are exactly the same". But then team "timing" is talking about resolution and motion fidelity, not necessarily color reproduction.
For example. How do we determine the location of sounds? The difference in timing between when audio reaches the left and right ears. It can be as low as 10 microseconds according to this article:
https://www.sciencefocus.com/science/why-is-there-left-and-right-on-headphones
Another article mentions that humans can detect even less than 10 microseconds (3 - 5 microseconds?) of timing difference:
https://phys.org/news/2013-02-human-fourier-uncertainty-principle.html
So many things can be explained by this. Spatial Cues like staging and imaging. Transients and Textures depend on the speed of changes in frequency, not the frequencies themselves. I think those same things help in determining how detailed & resolving things seem and relate to micro and macro dynamics. It's known that if you compare a piano note to a guitar note... it's the brief attack characteristics, the pluck vs the hammer, that clue us into which sound comes from which instrument. I think all of the "life-like" things are mostly in timing dependent vs frequency or pitch.
From what I can tell... the things that make Hi-Fi gear stand out from just the cheapest gear with good EQ applied, are tied to the timing. I've been lucky enough to go to a Can Jam before and listened to very expensive things and everything below in terms of price. To my ears, there IS a difference and it didn't matter what the price tag said, I wasn't gonna buy the expensive stuff anyways... I just wanted to hear the differences for myself.
I've listened to things that "measure perfectly", like the near perfect Dan Clark Stealth and Dan Clark Expanse. DC uses meta materials to help dampen and "shape" the sound and coincidently measure nearly perfect to the Harman Curve. I've listened to many Chi-Fi DACs and AMPs that also measure perfectly (they all use mounds of negative feedback). And to my ears, those are some of the most boring and life-less things to listen to.
** So in my opinion, faithful reproduction of Frequency is NOT the holy grail. You can EQ things anyway you like and I agree that EQ is excellent! It changes the sound more than most things. But good FR performance is cheap in my opinion and that's great. What's not widely available are things that perform well in the timing. From what I can tell, that's what people pay up for.
I'd be interested to see if one day the industry starts creating ways to measure time-domain performance. In my analogy above I use the metaphor of "Frames Per Second", but timing changes can also be represented in Hz. In the first article, Humans can use timing cues as small as 10 microseconds (μs) which equates to 100,000 Hz in order to position a sound source. In the second article, Humans can detect changes as small as 3 μs. The article mentions 13x to 10x better time difference detection than expected so if 3 μs is on the extreme 13x side that means the other participants were closer to 4 μs or the 10x figure. Going by the 4 μs figure, that would equate to 250,000 Hz resolution. It's not about pitch, it's about changes in the audio.
While the best responses to this post have already been given, I just wanted to add one thing. The reason what you see on the graph for a given headphone "doesn't tell the whole story" is also in part because its being measured in the condition of being on a particular 'head' - this is how we should think of measurement rigs.
Each rig has its own head-related transfer function (HRTF), as do you, and these are likely to be different to some degree. Think of this as like the effect of the head and ears on incoming sound, and that effect for your head and ears is bound to be different from that effect of the measurement head. That's not to say "we all just hear differently", since we all typically have heads and ears that are... head and ear shaped, but there are still going to be some differences that can be meaningful.
So, HRTF variation is one reason, but there is also another one, and that's the Headphone Transfer Function (HpTF). This is how the behavior of the headphone can change depending on the head that its on. You mention the well-measuring DCA headphones not sounding very good, one likely explanation for this is that the headphone itself is behaving differently when its being worn by you - and with respect to those headphones in particular I'd actually expect this to be the case (it was the same for me).
It doesn't mean the graph is wrong, or that categorically the product doesn't sound like the graph to SOME person. It just means it doesn't to you, because the condition of that headphone is different. Bottom line, HRTF and HpTF effects explain much of the whole "there's more than just FR" concept - at least in cases where all else is reasonably equivalent.
That’s awesome! I didn’t know that