None of these references even started to touch on the physics of “why?”
> GREY, J. M. Multidimensional perceptual scaling of musical
timbres. The Journal of the Acoustical Society of America,
Acoustical Society of America, v. 61, n. 5, p. 1270–1277, 1977.
Listeners were told to rate the similarity of two tones (of 13 orchestral wind instruments and 3 strings) on a scale of 1 to 30. Results were subjected to a hierarchical clustering algorithm. Results are plotted on three axes. Axis 1 measures narrow vs wide spectral bandwidth. Axis 2 measures synchronicity of high harmonics together with spectral fluctuation through time. Axis 3 measures high vs low frequency during the attack phase.
> IVERSON, P.; KRUMHANSL, C. L. Isolating the dynamic
attributes of musical timbre. The Journal of the Acoustical
Society of America, Acoustical Society of America, v. 94, n. 5,
p. 2595–2603, 1993.
In separate experiments, subjects heard complete orchestral instrument tones, the onsets of those tones, and tones with the onsets removed. Orchestral instruments are vibraphone, tubular bells, piano, violin, cello and 11 wind instruments.

SCHUBERT, E.; WOLFE, J.; TARNOPOLSKY, A. Spectral
centroid and timbre in complex, multiple instrumental textures.
In: Proceedings of the international conference on music
perception and cognition, North Western University,
Illinois. , 2004. p. 112–116.
No use: “ This paper investigates the dependence of perceived timbral brightness on
pitch and spectral centroid for single notes and pairs of simultaneous notes. In both cases,
brightness is better correlated with the spectral centroid fc than with the ratio of fc to the
pitches of the notes.”
> BISMARCK, G. von. Timbre of steady sounds: A factorial
investigation of its verbal attributes. Acta Acustica united
with Acustica, S. Hirzel Verlag, v. 30, n. 3, p. 146–159, 1974.
Gets a 4-D result, but these are steady sounds and perceptual only. “An attempt was made to extract from the timbre percept those independent features which can be described in terms of verbal attributes. Pairs of opposite attributes, such as dark – bright or smooth – rough etc. Four orthogonal factors accounted for 90% of the variance. The factor carrying most of the variance (44%) was represented by the scale dull – sharp.”
> ELLIOTT, T. M.; HAMILTON, L. S.; THEUNISSEN, F. E.
Acoustic structure of the five perceptual dimensions of timbre in
orchestral instrument tones. The Journal of the Acoustical
Society of America, Acoustical Society of America, v. 133, n. 1,
p. 389–404, 2013.
Again only steady tones. Again only ordinary musical instruments. “Timbre space of sustained instrument tones occupies 5 dimensions.”

