Rendered at 03:21:02 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
stego-tech 10 hours ago [-]
I cannot hear the difference between 16/44.1 (and by extension, 16/48) and High-Res Content generally, be they HDCD, SACD, or just straight-up Masters from Qobuz. This is on multiple sets of equipment, ranging from El Cheapo earbuds all the way to HD800 cans and full-fledged tower speakers being bi-amped.
That’s not why I go for High-Res stuff, though.
It’s all about archival, at least for me. With a 24/192 Master in FLAC or ALAC, I can downsample to whatever the destination form factor is. I can transcode to a 320kbps MP3, or a 16/48 WAV stream for a smart speaker, or a 24/96 stream for the theater. The point isn’t that I can hear the difference, it’s the fear that I might lose something irrecoverable by sticking with lower-quality files for bulk storage. Once data has been discarded, it cannot be retrieved, and that influences my preference for storage (and is also why my BD/UHD rips are into MKVs, no re-encoding).
Now that being said, I will absolutely hem and haw and ABX different releases to determine if I opt for the 16/44.1 CD rip of an album from the 80s or the new 202X remaster in 24/192 (spoiler: almost always the former), and I absolutely prefer anything with classic instruments (Jazz, Classical) in higher-quality formats because of a subjective perception of a wider, clearer sound stage, though this is almost certainly a psychological effect from performing in concert bands and orchestras rather than physical or objective in nature.
Like I tell newcommers: if it sounds better enough to you to warrant the purchase price, then that’s all that really matters. Enjoy the hobby.
saltcured 8 hours ago [-]
Decades ago, I was treated to an ABX test in my brother's recording studio. I easily recognized and preferred a 24/192 master he played versus the 16/44.1 down-mix. I honestly don't know whether there was something wrong with the down-mix, but qualitatively it did feel like it was "muffled" and coming from speakers, while the master really felt like live performance. He was surprised that I could tell them apart.
I also spent a lot of time ripping my old CDs to FLAC and trying different MP3 and AAC encoder settings to get playback that felt transparent enough to me. I could never tolerate Sirius/XM radio streaming due to the horrid compression I heard with every futile attempt. I still seem to have more sensitive hearing than most people around me, but in my 50s I know it isn't what it once was.
I never had huge budgets, but did strive for hi-fi in my limited ways. I used things like toslink and HDMI to send raw PCM data from Linux to my Yamaha A/V receiver's DACs + amplifier to drive somewhat nice Polk tower speakers. But then COVID-19 happened, and this stuff was packed up to move house.
Nowadays, music playback is streaming with mundane "subwoofer + satellite" PC speakers or MP3 playback with a mini-SD card permanently parked in my car's infotainment system.
vor_ 5 hours ago [-]
> Decades ago, I was treated to an ABX test in my brother's recording studio. I easily recognized and preferred a 24/192 master he played versus the 16/44.1 down-mix. I honestly don't know whether there was something wrong with the down-mix, but qualitatively it did feel like it was "muffled" and coming from speakers, while the master really felt like live performance. He was surprised that I could tell them apart.
As referenced in the article, a common explanation for those audible differences is that the high-resolution version of the album is sourced from a different master.
saltcured 4 hours ago [-]
> As referenced in the article, a common explanation for those audible differences is that the high-resolution version of the album is sourced from a different master.
In this case, it was my brother's own 24/192 recording, down-mixed by him to CD format with the intent that it be transparent. I believe he said his software was supposed to be dithering, but this was ~25 years ago and I can't really confirm the details anymore.
TheOtherHobbes 4 hours ago [-]
This is easy to disprove by downsampling from a 24/192 source to 16/44.1 Even if the downsampling is (close to) ideal there are obvious differences.
In fact if you can't hear the difference between 24/192 and 16/44.1 you shouldn't be working in audio. (Doesn't apply to consumers. Does apply to musicians and engineers.)
It's like being colour blind.
And if you don't understand the math behind quantisation, you shouldn't be posting pseudo-scientific videos where you use an oscilloscope and a cheap spectrum analyser - both tools with very limited resolution - to "prove" your point.
16 bit isn't enough for hard, objective reasons. One is that the noise spectrum of quantisation is not simple. Most people assume it's something close to plain white noise, but it really isn't. It's actually a very complex spectrum with some prominent peaks at specific subdivisions of the sample rate. Those frequency peaks are significantly above audibility. 24-bit quantisation shrinks them below audibility.
The other is that most people can hear dither/noise-shaping at 16-bits. That adds a single bit of noise which should - if you're being very literal - be far below the threshold of audibility. But it clearly isn't.
These two facts are related.
The more complex reason is that listening is an active perceptual process. The brain does a huge amount of processing to separate sources and place them in a perceptual field which includes information about perceived object type, distance, and ambience cues. Some of those cues are very quiet, and we don't hear them linearly.
So using sine waves as some kind of perceptual reference for audibility is nonsensical. We hear much more complex signals in an active way, and if there's information missing in the quiet parts - which there is with limited quantisation - then the signal simply isn't accurate.
nullc 5 hours ago [-]
This is an extremely hard comparison to do well. I'll give a few examples as to why:
Small differences in gain are ABX able much more readily than differences in noise at the 16 vs 24 bit level. So if the signal chain gives even a small difference in gain between the samples that's what you'll track. A reasonable conversion path to 16 bits for mastering will also apply dithering and some kind of brickwall limiting (you have to limit after the dither or as part of the dither as dither can change levels!), and this can result in gain changes. The DAC may behave differently or have outright bugs for some configurations too.
This is particularly true wrt reconstruction filters for sample rate differences. And if you were comparing 44.1k and 192k then the physical DAC itself was likely running at a different rate and its _analog_ filters are probably better optimized for one vs the other (this is less true for 48k vs 192k, as the hardware likely runs at the same rate for both). So one answer to this comparison can be "on this particular hardware this rate is better than that rate"-- but that's a implementation property not a property of format choice.
You might think, "okay I'll use a mathematically perfect down and up conversion process and run the DAC in the exact same configuration for all cases". But even then you run into issues like after reconstruction the _inter sample_ peak levels will be higher than the levels of the samples, so you have to handle that and in a way that doesn't produce a gain difference between the two configurations. (probably by running your perfect process and finding the gain level that results in no limiting, then making the gain of the original match).
And then for the high rate vs non-high rate you have to deal with the fact that most amplifiers are not particularly linear (compared to well constructed software at least!) and that any real speaker is very far from linear. This means that the presence or absence of ultrasonics will change the audio in the 0-20khz band.. Before you think "well that could be a reason that high rate is better" observe that if there was some consistently good effect from the ultrasonics you could just bake it into the low rate sample.
> but in my 50s I know
Yeah if you're in your 50's you're absolutely not hearing differences way up above 20khz (especially if you're male), I bet you can't even hear CRT flybacks from 100 yards anymore. :P Most people have no idea how much their high frequency hearing degrades as they age because it plays approximately no role in your life, but it's real, dramatic, and as far as I know happens to everyone.
I don't mean to discount your experience: I don't really doubt that it was real. But answering the general question of the necessity of low vs high rate probably takes a team of experts, armed with test gear and the designs of the HW/SW in question, to vet the test configuration. Testing a _particular_ configuration without the ability to distinguish its implementation quirks from format-fundamentals is much easier and that's what most attempts to test this question are actually testing.
By testing in a recording studio you were doing far better than most such comparisons. Usually people try comparing different files and they're comparing entirely different mastering processes. Files made for the "high res" market will often have much less compression and limiting then files made for commercial radio play / casual listening... and truly do sound obviously much better. Some of my favorite recordings are rips from vinyl. Vinyl is an awful format from the perspective of audio fidelity, but it's also pretty intolerant of excessive compression and limiting because the record will skip if the needle is bouncing off the rails. And more recently I suppose they also avoid over compression there because of the difference in target listener/environment.
saltcured 3 hours ago [-]
Yes, perhaps the amplitude was subtly different.
This was supposed to be running the DACs to match the source configuration, not resampling into some common format. I think that is an unavoidable part of the whole end-to-end ABX test concept.
Maybe it would be interesting to up-sample back into 24/192 and play both in that mode. But then people would argue about what type of up-sample to use.
I was in my mid 20s for this test. I understand my high-band hearing was better back then.
bigiain 4 hours ago [-]
> Small differences in gain are ABX able much more readily than differences in noise at the 16 vs 24 bit level.
This was common knowledge at least as far back as the mid 80s, when every hifi shop and salesguy knew to ensure the bit of gear with the highest profit margin got played an almost imperceptible bit louder than the gear the customer came in to buy during back to back testing.
nullc 4 hours ago [-]
It's also a reason why double-blind testing is important. If someone doing the setup is expecting one piece of kit to sound better, if it doesn't they'll check the configuration more, and difference in gain can come from many sources. So errors that result in higher gain in favor of the "better" candidate go uncorrected, while ones that favor the worse tends to be fixed.
Point being: it doesn't even require an unscrupulous sales person to get similar results to an unscrupulous sales person! :P
Applejinx 4 hours ago [-]
That would be how you'd go about telling, sure enough. You can't go by 'frequencies' or distortions or anything like that, these analog departures from convincing reality aren't how digital failings manifest.
You try to hear the brickwall by the muffled, enclosed quality and possibly by the weird pre-ring blurriness of the filter making things sound more vague than they have to be, and you hear the truncation not because it is audible 'distortion' as we know it, but because depth collapses and it sounds like it's coming from the speakers and not being a separate space behind/around the speakers. At no point will it be the most glaringly obvious thing but it'll never be 'distortions' as we imagine them, it's more a 'pod people' lack of personality thing.
Like a much subtler version of listening to AI music :)
I'm quite happy with 24/96 as suitable overkill for anything I might want to hear or do. Neil Young went hard on the proposition that 192 was necessary. Sold the Ponoplayer, I had one but it died on me, battery failed eventually. It really did sound awesome beyond just about any other listening device I've ever heard…
TheOtherHobbes 4 hours ago [-]
24 > 16 is not debatable. Sample rates are more complex because then higher the clock rate the more you get distortions from jitter and the design of the DAC/ADC. Most converters introduce different artefacts at different sample rates, especially at the prosumer end, so you're not comparing like for like.
The last couple of generations of converters have gotten a lot better, so 192kHz today is likely to sound cleaner and smoother than it did ten years ago, where there was a good chance the clock was quite jittery.
Personally I don't think it's worth the extra bandwidth for playback, but I can understand why some people might want it.
Generally all of these "debates" come down to people who think math > circuitry. All real designs are imperfect trade-offs. They all have issues, and arguing as if converters are perfect when they never are, and the imperfections can be benched objectively, is... not very scientific.
empiricus 8 hours ago [-]
Even for PC, I recommend some cheap studio monitors.
Yeah I'm just lazy about dealing with the room. If I find the motivation, I'll pull the original equipment back out of storage.
dtgriscom 4 hours ago [-]
> I absolutely prefer anything with classic instruments (Jazz, Classical) in higher-quality formats
High-dynamic-range material benefits from lots of bits.
Cider9986 9 hours ago [-]
I can't hear the difference between 128 kbps opus and FLAC.
nullc 4 hours ago [-]
> I can't hear the difference between 128 kbps opus and FLAC.
A reasonable definition of transparency for high bitrate compressed audio is "Can the worst files be distinguished by a listener trained in what artifacts sound like". Maybe also add in having to use a high discrimination listening setup, including not running excessively loud (increases masking).
If that's not the test you're doing, it's unsurprising. At moderately high bitrates no one can reliably distinguish them on arbitrary samples: most inputs are easy.
If you test on known-difficult "killer samples" you'll probably easily distinguish them, even without first being shown what to look for, and certainly after.
During the development of Opus I created many 'trained listeners' and selected many killer samples, and I don't recall* ever encountering a tin ear that couldn't be taught to ABX any high rate samples, though some people are obviously much better at it.
I'm not sure I'd recommend it though: learning to identify artifacts has a frequent side effect of making low rate audio like the HE-aac used in SirusXM absolutely intolerable. I'm bothered by it even when I hear cars driving by using it. :)
[*] My memory for such things sucks, so I could be wrong-- but my point that it's not expected remains.
Cider9986 4 hours ago [-]
I did the ABX test extension in foobar2000 with Octopus's Garden. It was on nice headphones.
You're right it's just minor details.
stego-tech 7 hours ago [-]
And that's fine! I've got a flatmate who loves 320kpbs MP3s on studio monitors, I've got musician friends who swear by CD-audio and Sennheiser HD200s, and others who love how vinyl uniquely degrades over time on big speakers.
The takeaway from these sorts of posts, at least in my opinion, should be two-fold:
* Understand the physical limits of human senses and perceptions to help inoculate yourself against outright scams and grifts
* Liberate you from the "tech grind" and allow you to enjoy what you like, how you like it.
Cider9986 4 hours ago [-]
The thing I didn't understand with higher quality music files is that it's not like the entire song is different and better when you go from 64 to 128 kbps opus, it's just these super minor details that get changed. It was enlightening doing an abx test, but I still use flacs because it's nice not worrying about the quality mattering.
dspillett 5 hours ago [-]
> Understand the physical limits of human senses and perceptions to help inoculate yourself against outright scams and grifts
Also understand that while there is an upper limit, we are all different within that. I can hear the difference between 128Kbps and FLAC, at least for some content, but not 256Kbps, maybe not 192. For some content (spoken word etc.), 64Kbps, sometimes less, is perfectly acceptable (to me). There was a time I could hear the difference between some encoders, but that was decades ago and anything in active use is pretty damn good (and my ears are not what they used to be) unless you really crank the bitrate down or tweak other options daftly.
sgarland 4 hours ago [-]
I’ve not tried encoding my own MP3s in at least a decade, but when I was doing so, 128 kbps was instantly distinguishable to me on anything with cymbals, especially hi-hat: it loses that shimmery sound. At 192 kbps I could tell if I really, really tried, but it was so minute I didn’t really care. I was never able to reliably tell the difference between 256 and 320 kbps rips.
PaulDavisThe1st 5 hours ago [-]
> I can hear the difference between 128Kbps and FLAC
You've established this with double bind testing, correct?
z_open 10 hours ago [-]
As they say, most people listen to their music with equipment. Audiophiles listen to their equipment with music.
nntwozz 10 hours ago [-]
This is perfect, thank you this goes straight into my long-term memory bank.
On a tangent, whenever someone mentions LP sounding warmer or whatever I like to point out that I prefer wax cylinders (a.k.a. phonograph cylinders).
_kb 3 hours ago [-]
Those wax cylinders are a modern hack. The curved surface distorts the real artistic intent. The only way to appreciate the true beauty of sound is a the purity of soot etchings on a phonautogram.
fecal_henge 9 hours ago [-]
You Edison shill.
pimeys 5 hours ago [-]
I might be something from the middle. Yes, I did spend a hefty 5000 euros to my headphone setup. And yes it sounds absolutely magical and every day I'm happy listening to music with it.
But I also have a large multi-terabyte music collection, I follow new music, go to concerts, go to parties, talk about music with my friends in signal group chats.
It's a hobby, and when you get a bit older and start having some savings, if you love music treating yourself with a better system is not that crazy.
UmYeahNo 5 hours ago [-]
When I got old enough to finally afford those toys I discovered I couldn't hear above 16khz anymore.
pimeys 5 hours ago [-]
It is not only that. It's the spacing, how the bass sounds, separation of instruments. There's so many interesting headphones in the midrange to try out. Compare the Hifiman HE1000se to Heddphone 2 GT, or to Focal Clear MG and you'll understand.
Also with HEDD you get a handcrafted device made in Berlin. And if you go with nicer cables, they are very beautifully done and feel great. There is no difference in sound of course. Some people like jewelry, I can get similar enjoyment from beautiful audio equipment and cables.
az226 4 hours ago [-]
What’s the quality of this trove? As in bitrate or similar.
pimeys 4 hours ago [-]
Depends. I'm more into finding certain masters. And some of the albums are DSD tape transfers. DSD if that was the original recording format, if it was mixed and PCM was needed, DXD flac.
And so many CDs of course.
mingus88 10 hours ago [-]
That’s true, but I consider myself a collector. Think of how a comic book collector operates.
If I have an option to get a 16bit version of a recording or a high-res version, I choose the highest quality version very time
Same with a physical copy. A limited edition, better quality vinyl LP is more attractive if you are going through the trouble of curating a collection.
I’ve been curating a music library of digital files since before the iPod was released and I will always go for the highest quality version out of principle. I can always downsample it to any thing that makes sense.
rahimnathwani 10 hours ago [-]
The article says "I've run across a few articles and blog posts that declare the virtues of 24 bit or 96/192kHz by comparing a CD to an audio DVD (or SACD) of the 'same' recording. This comparison is invalid; the masters are usually different."
It may be simultaneously true that:
A) Humans cannot tell the difference between 44.1kHz/16-bit audio and any higher resolution, and
B) For a particular song, the best commercially available 44.1kHz/16-bit version may not be the best commercially available version
black_knight 4 hours ago [-]
I usually A/B test the different versions before choosing my canonical one. I will listen to the same sections in each version, flipping back and forth to hear the differences. It is incredible how much finding the right master improves the experience of listening to a track. Often times that means I end up with a hi-res version, but not always.
zamadatix 10 hours ago [-]
While 100% true, I'd phrase B) as:
"The quality of the particular mastering can still make a noticeable difference, regardless of the ability for the digital sampling rates to perfectly represent it perceptually"
Just to be clear that the statement applies to any releases meeting the A) criteria, not just 44.1 kHz @ 16-bit ones.
ycui7 4 hours ago [-]
24-bit was created because microphone want to record large dynamic range without gain switching circuit.
96kHz was created to better reproduce 20kHz high frequency, so the digital noise shaping filter does not need to be super sharp right at the Nyquist frequency.
Both were introduced for a sound technical reason. beyond that, most are marketing non-sense to cheat consumers.
geraldmcboing 5 hours ago [-]
The OP is a bit off with their description of why pro audio engineers work in higher bit rates and sample rates. We use 24bit to preserve low level sounds eg reverb, breaths etc and use 32bit float when recording as the headroom is so massive clipping is not an issue (other than of course still neeing to avoid overloading microphones with max SPL - cleanly recorded distorted sound is still a fail). Unclipping 32bit float feels like voodoo - I did a test, recording fireworks & unclipping the 32bit float recordings.
I use microphones that can 'hear' up to 100kHz (Sanken CUX100K) and for film sound design playing 192kHz audio at half and quarter speed the results are very significant, and reveal there IS 'content' above human hearing. Irrelevant for general listening but very important for sound design.
PaulDavisThe1st 5 hours ago [-]
Have you ever actually checked the number of actual bits your ADC can use? Most 24 bit converters struggle to get to 18 bits.
Nobody uses 32 bit float for recording (to do so is just to capture at least 10 bits of noise, most of that being brownian); its strictly a format for mixing and processing. You don't get any more resolution from 32 bit floating point than you do from 24 bit integer formats, but the result of "clipping" is less dramatic, hence the appeal of the format.
While there is some evidence that non-auditory human sensory perception may be sensitive to ultrasonic acoustic waves, it's pretty weak right now, and somewhat in the "woo" zone. It may turn out to be significant, or it may not. I wouldn't base an audio production workflow that requires 4x the cpu power and 4x the disk space on such tentative claims, but you're welcome to.
geraldmcboing 4 hours ago [-]
"Nobody uses 32 bit float for recording" - you are just displaying total ignorance here.
PaulDavisThe1st 4 hours ago [-]
My comment should have been more emphatic that: nobody uses AD converters that generate 32 bit floating point values natively when recording, or anywhere close to the resolution that format implies.
I am extremely aware that as a data format in DAWs and other recorders, 32 bit floating point is completely common.
geraldmcboing 4 hours ago [-]
Dude I've been doing sound design on films using these techniques for years. There is zero 'woo' involved, it is ALL practical evidence based use. I've been using 32bit float multitrack field recorder by Sound Devices MixPre10-II professionally for many years now. The recorder has three preamps per mic input, each gain staged to provide optimum signal to the 32bit float AD. Read this to clarify your thinking:
https://www.sounddevices.com/32-bit-float-files-explained/
Surely you understand a recording made at 48kHz has a max freq response of 24kHz and played at half speed that max freq is 12kHz and at quarter speed only 6kHz. You can very clearly hear the filter cut off due to Nyquist. Record at 192kHz with mics capable of 100kHz capture and when played at quarter speed, the sound is full spectrum because there is no truncated frequency response. And when I load a 192kHz recording to izotope RX I can literallu see the harmonics going up to 96kHz. (not with every sound of course)
I repeat, i am not talking about 'normal' listening. I am talking about an industruy you have no knowledge or lived experience with, so spare me the incorrect claims about what can & cant be heard.
PaulDavisThe1st 4 hours ago [-]
> I am talking about an industruy you have no knowledge or lived experience with
I'm the original/lead developer of Ardour, a cross-platform DAW, and have been working with digital audio for more than 25 years.
There are no 32 bit ADCs - your SD MixPre's are giving you (at best) 22 bits packaged as a 32 bit float value. The preamps make absolutely zero difference to the AD conversion (though they might sound real nice).
> Surely you understand a recording made at 48kHz has a max freq response of 24kHz and played at half speed that max freq is 12kHz
This is a very naive version of what "played at half speed" might actually mean. If properly and correctly resampled, this is not true.
> And when I load a 192kHz recording to izotope RX I can literallu see the harmonics going up to 96kHz
Well, I'd certainly hope so! But the question is: what are the energy levels associated with the partials above Nyquist? If you recorded at 384kHz with sensitive enough equipment, you'd see partials above 96kHz - but at extremely low energies because ... well, that's just how physics works.
[EDITED to remove AD/DA confusion]
geraldmcboing 4 hours ago [-]
I do not use the DACs in the MixPre. Its a recording device. The field recordings & studio recordings are transferred as data and used in a 32bit float 192kHz Protools session. So the recorders DAC is completely irrelevant.
The sounds are then used as source material, for processing and manipulation at 192k, 96k and 48k. There is no debate to be had. This is how film sound designers work & have worked for years now.
The half speed you call naive is again just showing your ignorance. Sound editors have been using this technique since the days of recording on a Nagra at 15ips and literally replaying at 7.5ips half speed, and at 3.75ips for quarter speed. There is nothing naive about it, it is a very well know technique. To be able to achieve the same result digitally with full spectrum has impacted every feature film you have experienced in recent years. Again I speak from decades of lived experience.
PaulDavisThe1st 4 hours ago [-]
Running tape at half speed has almost nothing to do with digital resampling, which is what playing digital audio at half speed is generally all about.
My use of DAC was a thinko, I've edited at least post to correct it since in the current context we're always talking about ADC. Apologies for that.
geraldmcboing 4 hours ago [-]
Wrong again. As a sound designer I can choose to import a 192kHz file into a 48kHz PT session in two ways, one as resampled audio which means pitch & duration stay the same, OR I can choose to import it without SR conversion, in which case the audio plays at quarter speed & pitch is 2 octaves lower. We use both techniques ALL the time, every day. It's a common technique every sound designer uses.
You are arguing about techniques you have no experience with.
PaulDavisThe1st 3 hours ago [-]
I wrote a DAW that does precisely what you describe. I've been doing it for 25 years.
nullc 4 hours ago [-]
> Nobody uses 32 bit float for recording
Yes they do, almost all high end field recorders used for film work are 32-bits now and have been for much of the last decade, often with some fancy preamp integration so that there is no expertise required for gain staging the recording. (I believe the implementations use a second matched 24bit ADC with 48 dB less gain in front of it).
The result obviously doesn't have a noise floor which is lower (as the noise of a room temperature _resistor_ gets in the way of that even at the 24-bit level) but they have more dynamic range so that your recording isn't ruined by hard clipping some unexpected loud sound.
It's a big improvement for practical usage, and also likely does improve SNR somewhat because you can run higher gains without as much fear that you'll ruin the recording. The reason it would pay off is that the SNR loss you get from splitting the signal is easily smaller than the SNR loss you would get from gain reduction to avoid clipping.
(maybe... capsule self noise is also limiting... at these levels, and usually people aren't using microphones designed for the lowest possible self noise unless they're doing something special)
PaulDavisThe1st 4 hours ago [-]
There are precisely zero 32 bit ADCs in existence.
There are ADCs that will provide 32 bits per sample but that's entirely different.
Current technology limits the bit depth to 18-22 bits and going beyond that you'd be very quickly recording brownian (atomic) noise anyway.
The point about 32 bit float is that it is a useful format for mixing, editing and general processing, so it is widely used in digital audio tools. But it is not a format that ADCs generate "natively" via their electronics - almost all of them are generate a 24 bit integer or fixed point value and then just supplying that as a 32 bit float value because the software asked for it (the software could have done it all by itself.
[EDITED: DAC->ADC since that is what I meant and what this is all about]
nok22kon 4 hours ago [-]
Rode NT1-A 5th gen microphone claims 32-bit float output, insisting it will not clip peaks
so maybe they do sample at 24 bit at a well chosen gain level and then convert to 32 bit float, with the max 24 bit value being above 1.0 float
or as GP said, use two separate ADCs at two different gains and combine their output
PaulDavisThe1st 4 hours ago [-]
> Rode NT1-A 5th gen microphone claims 32-bit float output, insisting it will not clip peaks
Of course it does! And that's what it does, of course. But that has absolutely nothing to do with the AD process itself, which is chip-limited to 24 bits and likely physics-limited to somewhat less than that.
You can't beat the physical limit of a DA circuit by doubling them up at different gains.
And .. you don't want to. Going beyond 22 bits gets you into brownian noise pretty quickly, which is completely pointless.
The best you can do (or could do) is get a very, very, very good DA that can really do 22 bits (likely not commercially available because of the expense), and then get the samples from it in whatever format works best for your purpose (24 bit integer, some fixed point value, or 32 bit floating point).
nok22kon 4 hours ago [-]
you have 22 bits for the typical audio voltage level, which you call 1.0 float
but what if you "allow" double that voltage and call it 2.0 float? a strong pressure into the microphone generates a stronger voltage
thermal noise limits you on the quiet signals, but not on the powerfull ones
so 22 bit for typical -1.0 -> 1.0 range and you can add a few more bits on top of that for stronger audio pressures (voltages) which you would traditionally clip
PaulDavisThe1st 4 hours ago [-]
Sorry, but this not how AD works. If your idea was valid, we'd have new generations of ADCs in our hands.
nok22kon 3 hours ago [-]
> In a 32-bit float recorder, you have two ADCs working in tandem to create a single audio file. One “low gain” ADC is optimized for high-level audio, and the other “high gain” ADC is optimized for low-level audio. If the high gain ADC clips due to loud sounds, the low gain ADC does not. And if sounds are too quiet for the low gain ADC to capture clearly above its noise floor, the high gain ADC still has plenty of headroom above its noise floor. Said another way, the low-level ADC handles the quieter sections, and the high-level ADC handles loud sections.
The first diagram in that article is pretty ironic in an HN comment thread about Monty @ Xiph's stuff. Have you never seen his takedown of the "stairstep" drawing?
nullc 3 hours ago [-]
It's discontinuous.
You have some low noise amplifier. There is a signal. You split it. The result on each side has >=1 bit worse noise floor, probably somewhat worse as we're not using superconductors :P-- as you expect: there is no free lunch.
Now: take one copy and attenuate it 48dB, further degrading its noise floor. Sample both. The attenuated copy is mostly useless, except when the input goes high enough that it would have hard clipped the other ADC.
So the tradeoff is that you lose a small amount of noise floor constantly-- out at the 20th bit, that you probably didn't care about (microphone self-noise is limiting you out there anyways at normal volume levels), in exchange for never clipping.
To turn this into a better ADC generally, you'd need the splitting stage to not hurt the noise floor, but it does.
The reason it's not the same as just lowering the gain so that you won't ever clip is that to get the same dynamic range you'd have to lower it by 48dB and now your ADC doesn't achieve its potential for typical signals. You could lower the gain by 3dB (or whatever the splitting cost you) and get the same results for the low gain signal and a little more headroom, but you would not get the massive headroom increase of this approach.
For this to work one must also have amplifiers with much wider dynamic range and SNR than ADCs, but we do.
The natural output for this approach is a float-- the most natural would be a weird float where instead of an exponent one bit tells you which ADC is in use and represents a factor of 256 or whatever, but in practice these recorders just output 32-bit floats. I haven't looked but I wouldn't be surprised if there were only two exponent values ever used in their output.
PaulDavisThe1st 3 hours ago [-]
> So the tradeoff is that you lose a small amount of noise floor constantly-- out at the 20th bit
So, basically, no better than the best AD converters we already have?
My understanding of the fundamental limit to AD performance is that the brownian noise level is around the 22nd bit level. So even if you come up with techniques to successfully measure down to that level, you're basically picking up .. inevitable, irremovable, irrelevant noise.
Possibly there are gains to be made by not worrying about the noise floor and caring more about the lack of clipping, but I'm not seeing people screaming about that. The "noise" seems to be "N bits of dynamic range", not "slightly less dynamic range but it will never clip!"
nullc 3 hours ago [-]
Yeah people describe the benefits incompletely/inaccurately. This approach has a worse theoretical SNR, but an effect that improves the delivered SNR in real usage: Without the clipping protection the user would massively lower the gain, hurting the SNR.
A common experience for someone doing field recording of performers (my experience is music) is you twiddle your setup to get the gains reasonably high to get good SNR even for quiet parts. ... and then you record the actual performance, and you find that the tuba player really got into it for the real performance and the new peaks are 10dB over where they were in the practice. And now your recording is screwed up with a bunch of hard clipping you have to deal with. So then experience tells you in the future to take whatever you thought was safe and lower gains another dozen db.
The multi-ranged recorders eliminate that problem and the result is that you don't need to use precautionary gains, and you get a better SNR in your recordings. You probably don't need to adjust gains at all: The gain can be whatever makes the self-noise of the microphone dominate the SNR of the process, ... which would be too high for the loudest samples, but the clipping handling deals with that.
The samples that need to use the extended range have worse SNR (and probably poor linearity due to mismatches between the converters), but human hearing is much less critical to noise with loud signals anyways.
geraldmcboing 4 hours ago [-]
Why are you obsessed with DAC? Its the ADC that is WHY we capture 32/192.
PaulDavisThe1st 4 hours ago [-]
If I said DAC, it was a mistyping. I am (in this context) always talking about the ADC.
nullc 4 hours ago [-]
I didn't say anything about DACs! I'm correcting a specific claim you made
> Nobody uses 32 bit float for recording (to do so is just to capture at least 10 bits of noise, most of that being brownian);
This is not true and not true for a good and important reason!
One which has no bearing on the kind of DACs that exist.
Modern field recorders allow gains set a 'reasonable' level that maximizes SNR for recordings but still won't clip when there are much louder peaks. Not so dissimilar to how a 6-digit multimeter can achieve its advertised performance both on a 0-5v range and a 0-300v range but cannot give more than 6 digits at the higher range.
PaulDavisThe1st 4 hours ago [-]
When I said "nobody uses 32 bit float for recording", I am referring to the result of the DA process that generates samples values used by a recorder.
Obviously, everyone and their mother uses 32 bit float as an internal sample format because of its fitness for purpose (except the folks who think they need 64 or 80 bit floating point, of course). But they are not using "32 bit floating point samples" - the samples come from an (at best) 18-22 bit integer conversion.
Tsarp 11 hours ago [-]
This really is driving a muscle/super car, or drinking expensive wine. At the end none of specs or tests matter. It is a form of art. If it makes the listener feel better (even if its just psychological) then its probably worth it.
munchler 10 hours ago [-]
To expand on this a bit, I appreciate some audio overkill because, if I do hear sizzle or distortion, it eliminates one possible reason and helps me figure out what’s actually happening.
It’s like having gigabit internet to my house: I don’t actually need it, but when a website is slow, I know the problem isn’t in my internet connection.
ubercow13 5 hours ago [-]
Would 192khz audio result in less sizzle and distortion? Or more audible band IMD from the sound >22khz
smilekzs 10 hours ago [-]
Well, at least there are objective performance benchmarks on cars, and some of them are okay proxies of performance in motorsports.
Correct. I've paid for Tidal for a decade because I just like the peace of mind that it's closer to the original recording. I'm sure it's mostly placebo, but I like it.
handedness 4 hours ago [-]
I tried Tidal nearly a decade ago, and the audible fluttering effect caused by their audio watermarking totally ruined certain types of music, like choral recordings, strings and such. It was obviously apparent on $20 ear buds driven by any device, far beyond the more stereotypical audiophile gripes.
I opened a support ticket but they never responded. After that it was difficult to take their lossless claims seriously when the labels were providing such garbage source material. Their whole value prop was totally hollowed out.
I don't know whether the labels still impose such horrible practices, but I largely gave up on streaming services after that experience and now focus on keeping good digital archives of my physical library.
PaulDavisThe1st 5 hours ago [-]
The original recording of almost all music on Tidal was done with equipment that was very, very far from the 192kHz "fidelity" it claims.
yellowapple 10 hours ago [-]
It's also sort of an inverted “Van Halen demanding a bowl of M&Ms with the brown ones removed” thing for me, too. The vast majority of my Tidal listening happens over Bluetooth, so that 24bit/192kHz FLAC stream is just gonna get downsampled to 16bit/48kHz anyway because that's all any Bluetooth speaker or headset is capable of doing — but the fact that it's an option in the first place signals that other things are being done right, too (namely: that Tidal's whole “we're the streaming service that pays artists the most per listen” premise actually has some semblance of merit rather than being complete marketing bullshit; while recording quality ain't the strongest signal possible for that, it's certainly a good sign when musicians/publishers are willing to send over the highest-bitrate lossless recordings they've got and not just the same ol' compressed-to-shit MPEG audio you can yank off YouTube for free).
wat10000 10 hours ago [-]
I'd distinguish between differences that anyone can detect but some may not care about, and differences that may not be objectively detectable at all. Muscle cars, at least, are different in a way that anyone can see. Push that pedal to the floor and it feels different from a Honda Civic or whatever. Whether that difference is actually interesting or good is, of course, a matter of taste. Whereas audiophile nonsense is often indistinguishable even to the connoisseur and depends entirely on some form of self-deception. Still could be worth it, depending on what one considers worthy.
mock-possum 10 hours ago [-]
That’s actually a really good comparison, especially because - yes I can hear the difference between an excruciatingly lossless digitization of a piece of music that I’m intimately familiar with, played back on expertly configured hardware… but the difference is so little, that most of the time, I’m find just listening to it at medium high quality streaming on a pair of <$50 headphones.
I’ve played with the nice toys, and they are nice, but for 100x the price, they barely deliver 1.5x the experience.
jerf 10 hours ago [-]
If you can't hear the squeals of the plants [1] in the studio's reception area, are you really getting the full experience of a piece of music?
Oh great. And here I thought that fantasy literature where forest elves could hear the screams of the plants they stepped on when they walked was just that -- fantasy.
SketchySeaBeast 10 hours ago [-]
Triffid music.
hackingonempty 10 hours ago [-]
@xiphmont also made an amazing video response to the many responses he received to this article. Using analog equipment he busts a bunch of myths and demonstrates what really happens with digital audio.
Thank you for posting this. I thought I knew a bit about what was going on with audio sampling and reproduction, but I learned a surprising amount from this well presented introduction
WarmWash 10 hours ago [-]
Foobar2000 has an extension that allows you to blindly test whether you can tell the difference between two tracks.[1] The prime use is to compare different encodings of the same song from the same lossless master.
It kind of changed me a bit when I ran through 20 lossless tracks I had re-encoded to various mp3 bitrates and realized that even on a fancy system, it can be really hard if not impossible to discern even moderate lossy from lossless.
If you are an audiophile geek, really think about if you want to try this, the reality check might crack your foundations.
But try out to stream that mp3 from your home server in lower bitrate to save data, e.g. as opus. And now you suddenly hear the lossy encoding.
We store files in the highest quality because it gives us the option to encode the music without audible loss of quality.
ryankrage77 4 hours ago [-]
I decided to test for myself, downloaded Lacinato ABX and tested a 32-bit 352.8Khz flac I had lying around, to the same file downsampled to 16-bit 44.1KHz. I couldn't tell any difference. Then I tried 192k mp3... still no difference. Couldn't reliably differentiate 128 or 64kbps mp3 either. I had to go down to 32k before I could be certain which was which, and even then I still had to listen carefully.
Think I need to get my ears checked. I know I can't hear much above 15-16KHz but I didn't think it was this bad.
jnaina 3 hours ago [-]
same here. seems all the years of q-tip use is saving me money by not needing to buy expensive Hi-End Audio gear.
DiabloD3 4 hours ago [-]
Yeaaaaaaaaaahhhh.... you might be a tiny bit deaf.
OTOH, we know nothing of your audio equipment nor how its setup.
ryankrage77 2 hours ago [-]
HiFiMan Sundara headphones, focusrite scarlett 2i2 interface. Although, it turns out the Scarlett is set to 48KHz anyway, and I can't seem to change it easily under linux. Not that it seems to matter for my ears, lol.
EDIT: Did some more ABX testing with a CD-quality track that I'm much more familiar with ('Introduction' from the Mirrors Edge soundtrack, which has been my go-to for comparing audio gear for the last decade). I could sometimes distinguish 128k mp3 this time, though interestingly, I got it consistently wrong rather than right. For some reason the compressed version seems to be my preference. Dropping to 96k mp3, I got it right 100% of the time - though only because there was a very noticeable difference in the stereo positioning of the first sound, rather than a difference in the quality of the sound. I think if it were mono I would still be unable to tell.
kstenerud 3 hours ago [-]
What's really interesting to do with all of these people arguing over audio formats (as always happens on HN) is to point a frontier model at this thread.
In a nutshell: nullc, rahimnathwani, zamadatix and vor_ know their shit, and geraldmcboing and PaulDavis are technicially correct but talking past each other. speak_on and TheOtherHobbes are confidently wrong.
And also: 44.1 kHz captures the entire human audible spectrum with room to spare, and 16-bit already goes beyond anything useful for listening. The higher resolution / sample size format is useful for production or archival purposes only.
The two main reasons why you hear a difference between the two formats: (1) it's likely a different master, (2) tiny gain differences in the signal (salesmen use this trick, but it's also easy to do it by mistake).
manoDev 5 hours ago [-]
They make sense for so called audiophiles who don’t understand Nyqist frequency theory.
It’s like photographers who are confused about the difference between raw and bitmap (jpeg), videographers confused about the difference between linear raw vs log vs gamma encoded, etc.
Just because a data format with higher bit depth/sampling frequency/whatever exists for editing purposes, doesn’t mean it’s “better” or makes sense as a consumption format for a finished work.
casion 5 hours ago [-]
They make sense for sound designers and derivative artists (e.g. sampling, which is a real artform).
Forms of manipulation bring inaudible content into the audible range.
Of course that doesn't mean audiophiles aren't being audiofooled by it, but there is legitimate usage.
sholladay 8 hours ago [-]
Music producer here. High resolution audio is useful for editing and anywhere there might be downstream processing or format conversion that may or may not be high quality, let alone lossless. The article covers that pretty well.
However, the article claims that the final distribution doesn’t need to have a bit depth of more than 16. That does not match my experience. I can tell the difference between my renders that are 16 bit vs 24 bit. I cannot tell the difference between 44.1 kHz and higher sample rates, and that’s consistent with the math (Nyquist-Shannon), but bit depth is a different matter. Would be fun to participate in a double-blind test that includes my own tracks and others.
PaulDavisThe1st 5 hours ago [-]
> I can tell the difference between my renders that are 16 bit vs 24 bit.
established using double blind testing, I assume?
nok22kon 5 hours ago [-]
thermal noise allows about 18-22 bits of real precision at audio level voltages, so it's plausible that 16 bit is somewhat limiting
PaulDavisThe1st 5 hours ago [-]
16 bit may limit it on the input side, but the question is more about human hearing's sensitivity on the "output" side ...
cozzyd 11 hours ago [-]
What a human centric view. I like my music to scare neighbor's pets.
Just get one of those "hi fi" valve amplifiers from Amazon you see under $100. The valve already distorts the sound, so you don't need to bother paying more for low distortion anywhere else in the audio chain. Saved you thousands of dollars, done!
PaulDavisThe1st 5 hours ago [-]
Distortion is why people love the sound of vinyl.
And its all good! It's perfectly fine to say "I prefer the sound when the whole mix (or just that guitar) ends up being subject to interesting and possibly harmonically relevant distortion at low levels".
Just don't say "The version with the distortion is more accurate than the one without", because that's a lie.
hobonation 10 hours ago [-]
Counter: An ultra high bit rate solves the problem and you can stop worrying if it's the weakest link.
You can the focus on other things.
Example: I Bought the best skis possible. Now I know I need to just focus on my skills and not blame the equipment.
RijilV 8 hours ago [-]
I hate to be the one to break it to you, but high end skis make tradeoffs which are harmful to beginner or intermediate level skiers... also there's sorta no thing as "best ski". what you'd want for high speed bombing double blacks is going to be different from off piste or moguls or snow park fun.... double also, skis wear out. Depending on who you want to believe it's as low as 20-30 days. Which, granted the average skier is at something like 5 days a year. but if that's you... triple also?
As for how this relates to audio compression, in particular in the context of 2012. you are making a tradeoff of storage size and decompression cost. Maybe that doesn't matter to you, but maybe it either did in 2012 or still does.
hackingonempty 10 hours ago [-]
The point of this article and video is there is no problem with 16-bit 44-kHZ PCM. It thoroughly covers the audible range and is there is absolutely no need for more when distributing music for humans to listen to.
The problem is the people spreading myths and disinformation out of ignorance or to promote their enterprise.
The weak links are producers/mastering-engineers, speakers/headphones and the room when using speakers.
me551ah 10 hours ago [-]
Nobody downloads music these days and everybody just streams. Audio at 24 bit still takes a small fraction of the bandwidth that 1080p video takes, so I don’t understand the hate for it.
I use a DAC by focusrite which can do 24-bit, and if I want to listen to higher fidelity audio on my planer headphones then I should be able to. Why should I limit myself to 16-bit
mingus88 10 hours ago [-]
Counterpoint: bandcamp is doing well. Vinyl sales are doing well.
If I like an artist that I find on streaming, I buy an LP and get a lossless download for free. I still have a music library and I will never rent my favorite music.
Artists prefer to connect directly with their fans and BC is probably the best platform for people who care to pay and support acts directly. They have high res downloads and I import them.
zamadatix 9 hours ago [-]
I don't think the hate is about people who know it doesn't actually sound different if the audio file is 16 bit or 24 bit or necessarily about receiving a few more bytes than they need, it's about the pushes by these types of streaming services/offerings or people insisting that it's supposed to be any better for listening when it's not.
Also the playback rate and the file rate are different topics. The former can get into scenarios more like the audio processing section of the article e.g. I had this one shitty headset for work which required me to set the volume to 1-2 (out of 100) on the computer and I could actually blind test tell when it was in 16 bit or 24 bit mode because it was cutting and boosting it so much it effectively lost precision in 16 bit mode.
pimeys 5 hours ago [-]
Wait, what? I do download everything I listen. And Roon is quite popular in the music communities. How else you can make sure you have that correct mastering of your favorite album?
dlcarrier 8 hours ago [-]
There is a good reason to distribute it though, and compressed it doesn't really change the file size.
There's multiple YouTube channels that I listen to as podcasts, that are professionally created and the creators presume that exported audio works like studio audio, so what you end up with is really quiet audio that can't be turned up without pre-processing.
If we distributed audio the same way we work with it in a studio, we could forgo a lot of problems.
Also, the human ear does have enough dynamic range to make 24 bits worthwhile, though that much dynamic range is rarely used in recordings, and that high of a bit depth provides no benefits within a small dynamic range. A 192 kHz sample rate, on the other hand, is always useless.
PcChip 10 hours ago [-]
I'm curious if the audio was being sent bit-perfect to the DAC for all of these tests (ALSA direct), or if it was being run through the audio mixer and being resampled
I can always tell if my 44.1 songs are being resampled to 48 because they're being run through the OS mixer
dist-epoch 10 hours ago [-]
Proper audio resampling should not be identifiable. Of course, the OS mixer probably doesn't do proper (CPU expensive) resampling.
But a quality audio player should account for this and do it's own.
It is an incredible resource to see the quality of the resampling algorithms used by the actual production software likely used in any digital audio workflow.
You will see that while the best are indeed almost 100% transparent, many are not.
Currently giving me: "Internal Server error reading database"
nok22kon 4 hours ago [-]
I remember using Adobe Audition for resampling audio, this site shows I had good intuition
your software is among the best, but not pitch black best :)
PaulDavisThe1st 4 hours ago [-]
Yeah, we use Secret Rabbit Code for ours, though we have access to the sox code now and that is "perfect". We might change to that as the default sometime this year.
PcChip 8 hours ago [-]
I'm also one of those audiophile crazies that obsesses over which metals to use in cabling, power filtering, swapping opamps, and builds their own DACs, amps, and speakers
rasz 9 hours ago [-]
"proper" resampling was expensive in 1997 when Intel was introducing fixed sampling AC'97, but was below noise floor of CPU load meter in 2007 when Microsoft released Vista killing hardware mixing.
rz2k 9 hours ago [-]
My good enough amplifier and DAC combo claims up to 24bit/192kHz, I use a cheap optical interface from my computer that claims up to 32bit/192kHz, and the streaming service I use serves most albums at 24bit/44.1kHz.
It would have cost the same for the entire stack to be 16bit/44.1kHz at every step, but with excessive resolution I can control the volume anywhere. The bits right before the analog conversion at the end are essentially the same whether I turn down the volume in the software player, the operating system, or the DAC/amplifier.
PcChip 8 hours ago [-]
you might want to see if your DAC re-clocks incoming optical, if not then it's relying on the cheap clock generator from your computer
rz2k 7 hours ago [-]
Some people have claimed to hear an improvement with an external clock on a Wiim Ultra, but I do not think it is possible to re-clock the WiiM Amp Ultra with an outboard clock.
When I play from the computer, I'm not sure whether it is using the clock on my Mac, the clock on the optical interface, or the WiiM's clock. However, I do not notice any difference in fidelity when I use the Qobuz software player on my Mac or use Qobuz Connect to allow the player to directly stream from the source, so either it isn't a difference that I can hear, or the WiiM's internal clock is used for both sources.
codedokode 5 hours ago [-]
192 kHz vs 48 kHz can make a difference if you slow down the audio. If you pitch shift down 2 octaves, the ultrasonic range 20-80 kHz turns into 5-20 kHz and there will be large difference between 192 kHz and 48 kHz sources. However, I do not know if it would sound good because the mixing engineer cannot hear those frequencies and mix them properly, or the microphone might not catch it or some of the material could be recorded with lower quality.
Also, sadly consumers are getting used to low quality audio nowadays - they often listen to lossly compressed audio on social media (sometimes decompressed and re-compressed several times) which is then re-compressed to send to bluetooth headphones, or played back on an awful smartphone speakers. Streaming services also use compressed audio.
5 hours ago [-]
hgoel 5 hours ago [-]
I still insist on the higher bitrate stuff. I don't expect to notice the difference, I just think that music where the artists have bothered to prepare those files is probably recorded with more care than otherwise. I'm not generally listening to big artists where this can just be expected, and while I don't have any evidence to support my belief, I choose to continue believing it.
I'm not interested in finetuning everything in my life for efficiency.
speak_on 10 hours ago [-]
At a minimum, anything above 16/44.1 requires far more than just files: monitors, a treated room, listening position, DAC, etc... but most importantly - a trained ear. That last one is the most uncomfortable truth.
Blackthorn 10 hours ago [-]
Are you, per chance, a dog posting on the internet? Since 44.1khz sample rate is already past the range of the human ear, regardless of training.
MertsA 10 hours ago [-]
You need at least twice the frequency range for sample rate in order to represent the original signal. That's slightly misleading though, that's from the Nyquist-Shannon sampling theory and it's a mathematical fact but that is true for exact numerical samples, once you add in quantization that muddies the water a bit. Taken at the extreme, it's straightforward to see why a 1 bit quantization per sample at 44.1 kHz would not capture a perfect representation of some analog signal even if there's only a 1 kHz frequency component to the signal. If we instead decide to sample at 10 MHz but still one bit quantization, now that 1 kHz frequency component can be much more accurately represented even though we're still using the worst quantization possible. Don't think of quantization like a square wave or a step pattern, think of it as "the signal is closer to here than any other discrete value".
Now in terms of realistic audio encoding, 16 bit at 44.1 kHz is designed to be a faithful representation as far as human hearing is concerned. Can someone with a trained ear potentially tell the difference between that and 24 bit at 192 kHz? In a studio environment it's possible. Most audiophile claims are dubious and a blind A/B test catches them out on most of it but the Nyquist-Shannon sampling theorem does not directly apply to quantized samples, it's about exact samples and with quantization, sampling rate is intertwined somewhat with the quantization depth.
speak_on 6 hours ago [-]
As I responded below, you are confusing math with physical reality. A true 44.1 kHz converter can't realistically capture frequencies ~18-20 kHz due to the limitations of filters used in the process. A perfect lowpass brick-wall filter just does not exist - they all introduce artifacts, which a trained ear can identify. You don't need to be a dog to hear the difference, just someone who does not assume that Nyquist theorem can be magically applied in the real world (and, ideally, someone who utilizes high quality converters with oversampling).
Blackthorn 5 hours ago [-]
That extra 4.1 khz sample rate is for headroom for a low pass filter (and not necessarily a brick wall one). Leftovers or any such artifacts are below the noise floor, which is also an important part of the physical reality.
Would be happy to see an actual, real study to prove that humans can notice, but to my knowledge none exist that confirm they can. Not even any on teenagers or younger (the only group that can even hear close up 20khz).
vor_ 5 hours ago [-]
Is there evidence that a trained ear can reliably perceive these artifacts in a blind test of converters? I'd be interested in reading those links since converters typically oversample into the mHz range. At 11.29 mHz (256x 44.1 mHz), Nyquist will be at 5.64 mHz. Even the cheapest consumer converters are performing this type of oversampling.
To draw a design parallel: pixel-perfect design isn't something we are born with, noticing tiny details is a developed skill.
And yes, you are on point: oversampling is used extensively, but this just points at the exact issue: Nyquist theorem gave us a math algorithm, we still need to account for the electronic component imperfections. And then we are entering a different space of quality/precision/psychoacoustics/perception/etc. Meaning, not all converters, not all pre-amps, not all mics "sound" the same, even when they use same types of components on paper.
vor_ 4 hours ago [-]
Oh, dear, that AES 2014 paper from Meridian (which was trying to push its controversial proprietary MQA audiophile system the same year) was widely criticized on audio forums when it came out, ranging from the rectangular dithering method to the use of a hard metal tweeter that could cause IM.
Do you have more convincing sources?
speak_on 4 hours ago [-]
I don't. Do you? I am not a researcher. Saying that, do you have a double-blind study handy on MP3 256 vs 320 actual audible differences? If not, can you yourself hear the difference? If you can - it might be an illusion.
move-on-by 10 hours ago [-]
I don’t have great hearing, so I’m not sure I can really weigh in here (thanks punk concerts in my teens). I remember similar arguments around screens and 60Hz vs ‘the human eye’. I think a lot of people, myself included, can easily perceive the difference between 60Hz and something higher- given the right conditions. I would not be so quick to disregard claims of more sensitive hearing.
speak_on 6 hours ago [-]
(I commented on this topic above/below in more detail.) Even with not-so-great hearing you would still be able to identify the difference (ie artifacts are pushed down, not up). Look up articles on the practical limitations of AD/DA converters and why the seemingly counter-intuitive claim that the difference between 44.1 kHz and above is noticeable, is actually a fully industry-accepted practical reality: aliasing, AD/DA lowpass filters, etc.
labcomputer 7 hours ago [-]
I would. It’s really simple.
The human threshold-of-hearing curve intersects the threshold-of-pain curve at about 20 kHz.
Above that frequency (or thereabouts) the sound has to be so loud that it will literally instantly damage your hearing before you can hear it.
This has been replicated across many studies for more than 100 years.
Flicker threshold is completely different. You can’t damage your vision by increasing the FPS, and it has always been commercially desirable to use a lower frequency because that is cheaper.
speak_on 5 hours ago [-]
Would you agree that a trained human could identify artifacts produced by an imperect conversion process? If you lean "yes", then that's your answer: AD/DA is not a Rust function perfectly implementing the Nyquist theorem, it's a collection of physical components many of which introduce artifacts into the audio path. This thread is not about the theory of human hearing, the electronic components are literally imperfect.
PaulDavisThe1st 5 hours ago [-]
They're no more imperfect than the pickups on an electric guitar, the assembly inside the microphone, the circuit in the compressor and everything else in the analog signal chain that exists long before AD happens.
speak_on 4 hours ago [-]
Absolutely! All these examples have imperfect audio paths - that is the point.
PaulDavisThe1st 4 hours ago [-]
But the central point is that there's no reason to pick on the digital elements in any particular way. Recorded music in 2026 is a pretty good recreation of the original acoustic pressure waves when it is intended to be, but (a) not perfect, even in the pure analog domain and (b) it is frequently not intended to be.
speak_on 4 hours ago [-]
The central point is that AD conversion can and will introduce artifacts. DA process wil intrduce more artifacts. The "imperfect" is a huge range and AD/DA converters play a role in that. We are not talking about "golden cables" bs here, conversion does introduce measurable artifacts in the audio path. The more tracks you record the more artifacts you have. Can everyone hear them? Definitely no. Can they be heard - yes, I can hear the difference between an old Digidesign interface and Grace Design interface.
PaulDavisThe1st 4 hours ago [-]
No, the central point is that the analog signal handling before AD introduces vastly more "artifacts" than the AD or DA does.
In addition, nobody cares about "measurable" artifacts (or rather, they should not). What matters are "audible" artifacts. We have measuring equipment that is vastly more sensitive than human ears (e.g. your recording equipment that can pick up signals far above 22kHz). What's measurable is not particularly interesting - what's audible is.
Artifacts do not sum linearly, because they do not originate from correlated sources (unless you're doing something rather unusual).
Glad you can hear the difference between two converters, but I trust you've tested it in a double blind setting?
speak_on 3 hours ago [-]
Hm, no. The discussion was never about analog artifacts vs AD conversion artifacts. Both are present. And not sure why you use "artifacts", do you not believe the artifacts are real? How can the lowpass filter not introduce artifacts?
And absolutely - I blind tested coverters extensively. Mbox2, Black Lion Audio upgraded converters, UA, Prism.
PaulDavisThe1st 3 hours ago [-]
Note: blind testing is not double blind testing. Scientists evolved double blind for a reason: blind testing doesn't remove bias.
Yes, the discussion was "never about analog vs AD". But my point is that I see little point wasting time on one set of artifacts (in the digital realm) that are tiny compared to those introduced in the analog realm. If there's a mouse and an elephant about to enter your home, you focus on the elephant, no?
The big difference, of course, is that "everyone" has convinced themselves that most/all of the analog artifacts, as big as they are, are somehow "tasteful" or "artistic", whereas the digital ones are just "math errors". I don't think is too helpful.
And look, if lots of people could get through double blind tests and still show they can hear aliasing or whatever the digital artifact du jour is, then I'd say "yes, absolutely, we need to be very aware of this and do everything we can to reduce or eliminate it". But as far as I can tell, this just isn't the case.
speak_on 2 hours ago [-]
This is a more philosophical take… And I totally agree with you. I mix at 16/44.1 just for the record. I do not buy into the idea of gold plated connectors or 96 kHz mixing. My point was never about quality - I can hear the difference (the point!), doesn't mean for me personally > 44.1 is "better" or "worse".
To your main point: yes, all artifacts are just our learned, cultural, developed preferences. In the exact same way major/minor thirds were considered dissonant just a few hundred years ago - it's all a learned perception, not an absolute judgment.
I would go even further, doesn't matter whether people perceive aliasing as a major issue, it's no different from the U47 "warmth". You can't afford this, probably, as a software developer in a way, but at the most fundamental level any sound's - or artifact's - judgment is based on our our current diagram of "sounds nice" vs "sounds bad".
ses1984 4 hours ago [-]
Can you give any examples of people identifying these artifacts in a/b tests?
I know from my 20-ish year mixing experience that I can hear the difference when mixing. Is it good evidence? No. So we can agree to disagree then.
clawlor 10 hours ago [-]
Max representable frequency is half the sampling rate (nyquist-shannon theorem), which is still a bit above normal but IIRC the extra headroom has something to do with eliminating aliasing
Blackthorn 10 hours ago [-]
Indeed. And what is the max frequency that a human can hear?
speak_on 6 hours ago [-]
The artifacts produced by pure 44.1 kHz convertion are aliased back down to lower frequencies. It's not about a theoretical human ear, it's about the actual physics of AD/DA conversion.
PaulDavisThe1st 5 hours ago [-]
But the energies of the signal present above the Nyquist frequency (22050Hz in this case) are almost always incredibly weak, and double blind testing rarely shows any indication that humans can actually hear the aliasing.
speak_on 4 hours ago [-]
Mixing process often involves hundreds of tracks, and if each introduces aliasing, this can become a problem. Some engineers do swear by "the final mix is 16/44.1 so why mix at a different resolution?" mantra - that's fine too.
PaulDavisThe1st 4 hours ago [-]
This is false. Aliasing is not additive in any meaningful way.
speak_on 4 hours ago [-]
Ok dude, you obviously never recorded anything. Twelve mics on a drum kit, 60 tracks of rhythm guitars, several bass guitar layers, vocals, backing vocals, electric organ, percussions, saxophone solo. Do you think recording them at 44.1 somehow creates a shared "cloud-based" aliasing artifact that I store in S3?
Firstly, it's an amazing experience to randomly interact with people like you - I love and use your software. Hats off and thanks for what you offered to the industry!
But secondly, your statement makes even less sense to me: obviously artifacts do add up. Yes, not linearly, like any complex audio in general. But the more tracks with artifacts I have, the more artifacts I have overall. It's not like they cancel each other (outside of normal frequency cancellation).
Rotundo 10 hours ago [-]
Depends on age of the listener, on average, 30 to 50 year olds hear a maximum frequency of 14 to 16 kHz.
Blackthorn 10 hours ago [-]
Right. Which are quite below 1/2 of 44.1k!
OkayPhysicist 6 hours ago [-]
Sure, but those are averages. I'm 30-ish, and my hearing doesn't cut out until somewhere in the 21kHz range. When I was younger, it was even higher. One of my roommates in college had one of those anti-rodent high-frequency noise generators, we almost came to blows over it.
UtopiaPunk 10 hours ago [-]
If you want to hear the difference between an audio file recorded at 44.1 and 88.2kHZ, then you need slow the audio playback down. Otherwise, a trained ear cannot physically hear the difference.
speak_on 6 hours ago [-]
44.1 is "enough" only in theory. This assumes a physically impossible steep filter. Realistically, frequencies around 20 kHz will create audible artifacts (aliasing). So yes, a trained ear can tell the diffrenece between 44.1 and even 48 kHz. Like many other commenters in this thread, you are mixing up math theory with physical limitations of AD/DA converters. Oversampling is a common way to address this limitation, but strictly speaking 44.1 kHz is not as obviously "enough" as it seems.
PaulDavisThe1st 4 hours ago [-]
> Realistically, frequencies around 20 kHz will create audible artifacts (aliasing)
The energy of the signal components above the Nyquist is generally very low, and very few double blind tests have given any indication that humans can detect the resulting aliasing (even though many people claim to be able to do, almost always in non-double-blind environments).
Badly written digital synthesis can generate high energy signal components above 22kHz, but that's because they're badly written, not because the theory is wrong.
speak_on 4 hours ago [-]
Genereally very low for a single track? What about 200 tracks? Badly written synthesis, or badly recorded live instruments, or bounced and re-bounced dozens of times... we are not talking about the quality-defining aspect here. You can produce an excellent mix on KRKs connected directly to a MacBook.
This space is not driven by a single precise formula. 48/96 kHz helps some engineers to produce better sounding mixes. Can everyone hear the extended range of Adam tweeters? Probably not. But some can, and they benefit from that. Even if there is no double-blind study to prove this in absolute terms.
PaulDavisThe1st 4 hours ago [-]
If you recorded 200 tracks of the same instrument, so that the partials above Nyquist were all broadly the same, then sure, summing the tracks would include summing 200 copies of the aliasing results too.
But very little music is like that, and the energy profile above Nyquist will differ dramatically. Consequently, you're not summing a set of identical aliasing results, and in general, the results will still be undetectable to almost everyone.
Jacob Collier routinely works with 300+ tracks in Logic. He doesn't worry about this sort of thing, and neither do the Grammy voters who love what he does.
speak_on 4 hours ago [-]
Got it. Grammy voters love Collier's mixes. What about Tony Maserati? He can clearly tell the difference between 44.1 and 88.2. If your argument is that these engineers can't hear the difference - you are going to be disappointed. They can. Even Dave Pensado who mixes at 16/44.1, does that because he rejects the idea, he can hear the difference according to him.
PaulDavisThe1st 4 hours ago [-]
I can find no evidence that Maserati (or 99% of any other mix or mastering engineers) has ever tested his "appreciation" of the "crunch" at 44.1 in a double blind environment.
It is always amazing how much that is claimed about what people can hear fails to show up when tested in this, the only acceptable scientific way.
Perhaps Maserati has done this, and could still tell the difference. In which case, he should carry on! But he should carry on anyway! People should do what brings them joy, and if he likes working at 44.1kHz or whatever, he should absolutely do that.
What people should not do is lecture about stuff that isn't true and/or isn't demonstrable in proper test settings, and most (not all, but most) of the SR stuff fits into one or other or both of those categories.
vor_ 5 hours ago [-]
Do you have citations for this claim? The "golden ears" argument is often employed by audiophiles, but even the cheapest converters oversample by up to several hundred times as well as employ antialiasing filters.
scns 10 hours ago [-]
A treated room would be the most impactful, DACs the least.
speak_on 7 hours ago [-]
The most impactful for noticing the difference? Again, I would argue it's the trained ear. If you have plenty of mixing experience then all these details add up, and a treated room becomes the most critical - agree with that.
vor_ 4 hours ago [-]
So far, here isn't sufficient evidence that anyone has such reliably golden ears.
speak_on 4 hours ago [-]
Other than the top engineers in the industry. This is a discussion that always ends up in the "double-blind study" vs actual real engineers working in the industry.
yellowapple 10 hours ago [-]
The DAC is pretty impactful if it's outright incapable of outputting anything beyond the usual 48kHz :)
5 hours ago [-]
vor_ 4 hours ago [-]
Even the cheapest consumer DACs oversample into the megahertz range.
LarsAlereon 9 hours ago [-]
The main benefit for me is that digital watermarking becomes completely inaudible with high-res audio, but I can sometimes clearly hear it in standard resolution.
dist-epoch 11 hours ago [-]
The whole audiophile industry is built on stuff which doesn't make any sense
My favourite: "audiophile-grade" audio players which allocate a single continuous buffer of RAM into which they load/decode the whole .WAV/.FLAC file, because supposedly the CPU "jumping" between "fragmented audio" causes audible "jitter".
Of course, they don't know that what looks like continuous memory to user-code is probably discontinuous in kernel/physical RAM.
Didn't check in many years, I wonder if they created kernel level players to account for that, to have "true continuous memory"
platinumrad 10 hours ago [-]
Don't forget: "most players use malloc to get memory while new is the c++ method and sounds better."[1]
audiophiles (https://forums.stevehoffman.tv/threads/turntables-with-pace....) also claim that turntables can be rated on "timing, rhythm, and pace" in which supposedly the timing of the music can be affected by the turntable's mass and other properties.
How this would occur without also producing grossly audible pitch distortion never seems to be discussed.
lmc 10 hours ago [-]
> My favourite: "audiophile-grade" audio players which allocate a single contignuous buffer of RAM into which they load/decode the whole .WAV/.FLAC file, because supposedly the CPU "jumping" between "fragmented memory" causes audible "jitter".
Thanks for the laugh... this is absolutely bonkers. In case anyone is wondering, before sound hits our ears it has to go through a digital to analog conversion, which takes place on hardware independent of the CPU, operating with its own clock and buffers etc.
justsomehnguy 10 hours ago [-]
Am486DX/100 was enough to decode and listen an MP3 at 22KHz (and maybe mono?) and was more than enough to listen for 44/16/2 PCM. It's 31 y.o. today.
Sohcahtoa82 5 hours ago [-]
I remember playing 44khz 16-bit stereo MP3s encoded at 128 kbit/sec on a 133 Mhz 486.
It gobbled like 90% of the CPU and I had to make sure I gave it a pretty large buffer so it didn't stutter when an app claimed CPU for more than a second, but it worked.
wat10000 10 hours ago [-]
In addition to that, while it is possible to hit a delay and run out of buffer because memory access is slow (the most obvious would be if the input got swapped to disk at an inopportune moment), but the audible effect is really obvious. This isn't some subtle "oh my music sounds ineffably worse" effect, it's "my computer is glitching and my music is unlistenable."
billyjobob 10 hours ago [-]
I can tell when my CPU usage spikes because it causes a hum through my speakers, so this does not seem that far-fetched.
justsomehnguy 10 hours ago [-]
It's just means you have a shitty audio tract with not enough shielding. Move to SPDIF/TOSLINK.
codedokode 5 hours ago [-]
I have an external audio card, if I put it on a laptop I can hear the modem-like sounds. I wonder why it is so sensitive, should not DAC produce strong signal that cannot be easily affected by radio waves?
Also my headphones are extremely sensitive. I can touch the ring and sleeve of a jack with a finger, and touch a metal bed frame with a tip and I hear quiet clicks as I move the tip along the metal. Sometimes I do not even need to touch the jack with a finger. It doesn't work with small objects like a knife though.
PaulDavisThe1st 4 hours ago [-]
Bad grounding everywhere. This is insanely basic stuff.
nok22kon 5 hours ago [-]
the radio waves could be interfeering with the signal before it gets amplified
bellowsgulch 10 hours ago [-]
The latter is probably true, but the former does actually happen, and it's easy to accidentally do--lossless or not.
dijit 11 hours ago [-]
huh...
So I guess the programmer equivalent is distributing .pdb's (or, symbols)
Blackthorn 10 hours ago [-]
Pretty good analogy. Thing is though, the person who receives the 16-bit, 44.1khz music file can always upsample it to 192khz and not lose anything in the process (heck, lots of audio stuff oversamples internally to this level or beyond, for extra aliasing headroom!). I'm not sure about expansion from 16bit to 24bit though, downward expansion isn't necessarily perfect.
gizajob 10 hours ago [-]
You’d be adding 150khz and 8bits of nothing.
viccis 11 hours ago [-]
If you try to use empiricism when it comes to certain groups audiophiles, you are going to be sorely reminded that it's basically the equivalent of healing crystals for a different type of person. 24/192 is useful for mixing/mastering, but completely unnecessary for the end product to distribute for listening.
evo 11 hours ago [-]
24/192 is also great for digital synthesizers--if you're generating a waveform like a sawtooth that has theoretically instantaneous transitions, they can eat as much frequency as you can give them. Running at 44khz loses noticeable high-end content.
Most modern digital synths have already caught onto this and run internally at much higher sampling rates even if their output gets downsampled, but sometimes you run across a vintage plugin that runs at the host audio rate and working in a higher sampling rate is audible.
Blackthorn 10 hours ago [-]
You can generate perfect band-limited sawtooth waves at 44.1khz, there are multiple techniques for doing this and most production digital synthesizers use them.
Oversampling gives you headroom for aliases for the rest of the synth that is more vulnerable to it.
evo 10 hours ago [-]
Yeah, I was oversimplifying a blit, the raw waveforms are usually okay, but I distinctly remember old-school VSTs where you couldn't achieve a nice saw lead at 44.1.
Blackthorn 10 hours ago [-]
It's tough to tell without specific names, but I imagine a lot of particularly old* VSTs were written to use naive sawtooths rather than perfect band-limited ones, which would have terrible aliasing at 44.1 khz. Oversampling those would help a lot!
* Some people are still making this mistake, despite information on the (many) ways to do it the right way being widely and freely available!
evo 10 hours ago [-]
I wonder if there's also distortion or ring modulation stages where some of the energy above hearing range might spill into audible sidebands if they're not nyquist-limited first.
Blackthorn 10 hours ago [-]
Yeah, that's the "rest of the synth" part that's more vulnerable to aliasing.
There's some ways to do band-limited distortion but...they aren't nearly as widespread, easy, or universal as band-limited oscillators.
Ring modulation is funny though because you'd ideally want the sidebands to modulate down by default rather than filter them out, that's why you're using it.
nullc 4 hours ago [-]
> 24/192 is also great for digital synthesizers--if you're generating a waveform like a sawtooth that has theoretically instantaneous transitions, they can eat as much frequency as you can give them.
So if your synthesizers do not use proper band-limited oscillators then 192KHz is _FAR_ too slow. You'd want to be running at hundreds of KHz, perhaps a few MHz.
In reality synth software that doesn't sound like crap uses band limited oscillators and should work okay at 48KHz too. That said, even if the oscillators are band limited it may be the case the varrious modulations aren't band limited properly, as getting those wrong won't sound instantly wrong (in particular because you have to modulate to make it wrong, and the underlying change of the modulation may make it harder to tell its wrong).
Though also in those cases if you're not counting on every step being properly band limited then 192KHz may be an improvement but you're still probably getting some meaningful aliasing. I think given how fast computers have become relative to digital audio there is probably a good case to just make any "modular synth" run at 32-bit 480KHz or even 4.8MHz through every stage that could process the audio.
Maybe 192KHz really is enough to suppress the aliasing artifacts but I think to be convinced of that I'd want to see a system that supported both and validate that the difference between a downsampled 48KHz output from the two modes was below -90dB or something.
Or otherwise you can just declare that the aliasing is part of the sound and then there are no right choices... 24khz sampling, 48k, 192k ... who cares, use what you like best. :)
Applejinx 4 hours ago [-]
Hydrasynth aliases like a mad thing. My flagship synth ended up being Summit, and its oscillators are digital but run at a crazy high sample rate. Did likewise with some Chord Organ modules: that Teensy board it was built on could do chord audio at 300k and over a megahertz if you were just generating one wave as simply as possible. The freedom from aliasing really helped the sound, for all that it's a 12 bit analog output. A squarewave is a 1 bit signal…
dist-epoch 10 hours ago [-]
No synth generates sawtooths by literally drawing a saw tooth in PCM. The distorsion you get if you do that is not subtle at all.
colmmacc 10 hours ago [-]
32-bits are great for recording too because they do an incredible job of capturing the dynamic range without having to be precise on the preamp settings. It removes an entire job from the recording workflow.
192 for mixing and mastering can be useful especially if you're doing a lot of effects, especially anything that pitch shifts. But I've seen low quality phone-microphone recordings make it to the master; if you capture lightning in a bottle, it hardly matters what the settings were, what the microphone was, or anything else.
PaulDavisThe1st 4 hours ago [-]
The limit on current DACs is 18-22 bits. The rest is just brownian noise. Literally.
Aldipower 10 hours ago [-]
Even with mixing/mastering 96khz is enough for persisting to files. But as another commenter said, 192 is useful, if you bend and stretch samples!
tshaddox 10 hours ago [-]
They literally sell actual crystals that you’re supposed to place on top of speakers and amplifiers to make them sound better.
Blackthorn 10 hours ago [-]
We had a really nice crystal decoration that I happened to put on top of one of my TV speakers and, wouldn't you know it, it had this resonant frequency somewhere around specific human speech frequencies that drove us absolutely bonkers until I figured out the cause and moved it.
teach 11 hours ago [-]
(2012)
lokar 11 hours ago [-]
I wonder how many people think that 24 bit audio encodes 50% “more”
recursive 11 hours ago [-]
It is 50% more headroom above the noise floor in logarithmic decibels.
I completely accept that human audition has limits that are easy to determine by playing a pure sound. But is it the same with music, where multiple frequencies are played and interfere with each other? Aren't some harmonics or effects created by these "inaudible" frequencies?
To try to imagine something similar: the human eye is unable to see UV light, yet fluorescent paint has a visible quality of its own compared to "normal" pigments.
nok22kon 5 hours ago [-]
when beams of ultrasounds interract they can produce audible frequencies
24 bits is now ubiquitous and 32 bit is becoming the norm in recording studios.
evo 10 hours ago [-]
32-bit float has become popular in filmmaking/field recording equipment lately because, with a microphone preamp that supports it, you can capture the entire dynamic range of the microphone--there's no accidental clipping if you drive the gain stage too hard.
It's a bit redundant for a skilled technician, they're already used to setting the gain staging, inbound compression, and feathering the mics to avoid this in 24-bit, but if you're handing a boom mic to a novice and have a scene where e.g. someone's whispering and another person's screaming, it can be nice to not have to worry about it.
lysace 11 hours ago [-]
That use case is literally addressed in the first sentence.
metalman 11 hours ago [-]
sheeesh , measly 24-bit/192kHz
of course it makes no sense, unless it is downloaded through low oxyegen wire, which somehow and unfathomably, must have been omited or forgotten.
b3orn 11 hours ago [-]
If it has been transmitted via hollow-core fibres it will obviously sound hollow.
waffletower 10 hours ago [-]
For typical listening (though humans can perceive bone-conducted vibrations up to 100 kHz or even 120 kHz) 16-bit-fixed/44.1kHz is a high-fidelity transport format. As a DSP researcher, I prefer 32-bit-float/44.1kHz as a transport format. I often upsample to 32-bit-float/188.2kHz or even 32-bit-float/192kHz for signal processing applications such as high-fidelity reverberation via direct and FFT convolution. While the author advocates for the transport to ear use case, I would argue that 24-bit/192kHz provides greater fidelity and resolution for sound processing. I found the pedantic arrogance of the author to be annoying. But yes, the sampling theory is an important consideration -- but so is the quality of the actual digital filters used in the DAC->ADC pipeline. They are much more forgiving and less lossy at 192kHz.
Aldipower 11 hours ago [-]
[dead]
10 hours ago [-]
haunter 10 hours ago [-]
The more the bits the better the music, easy as one two three
Don't forget to buy the new low oxygen platinum plated HDMI cables for the better experience!
That’s not why I go for High-Res stuff, though.
It’s all about archival, at least for me. With a 24/192 Master in FLAC or ALAC, I can downsample to whatever the destination form factor is. I can transcode to a 320kbps MP3, or a 16/48 WAV stream for a smart speaker, or a 24/96 stream for the theater. The point isn’t that I can hear the difference, it’s the fear that I might lose something irrecoverable by sticking with lower-quality files for bulk storage. Once data has been discarded, it cannot be retrieved, and that influences my preference for storage (and is also why my BD/UHD rips are into MKVs, no re-encoding).
Now that being said, I will absolutely hem and haw and ABX different releases to determine if I opt for the 16/44.1 CD rip of an album from the 80s or the new 202X remaster in 24/192 (spoiler: almost always the former), and I absolutely prefer anything with classic instruments (Jazz, Classical) in higher-quality formats because of a subjective perception of a wider, clearer sound stage, though this is almost certainly a psychological effect from performing in concert bands and orchestras rather than physical or objective in nature.
Like I tell newcommers: if it sounds better enough to you to warrant the purchase price, then that’s all that really matters. Enjoy the hobby.
I also spent a lot of time ripping my old CDs to FLAC and trying different MP3 and AAC encoder settings to get playback that felt transparent enough to me. I could never tolerate Sirius/XM radio streaming due to the horrid compression I heard with every futile attempt. I still seem to have more sensitive hearing than most people around me, but in my 50s I know it isn't what it once was.
I never had huge budgets, but did strive for hi-fi in my limited ways. I used things like toslink and HDMI to send raw PCM data from Linux to my Yamaha A/V receiver's DACs + amplifier to drive somewhat nice Polk tower speakers. But then COVID-19 happened, and this stuff was packed up to move house.
Nowadays, music playback is streaming with mundane "subwoofer + satellite" PC speakers or MP3 playback with a mini-SD card permanently parked in my car's infotainment system.
As referenced in the article, a common explanation for those audible differences is that the high-resolution version of the album is sourced from a different master.
In this case, it was my brother's own 24/192 recording, down-mixed by him to CD format with the intent that it be transparent. I believe he said his software was supposed to be dithering, but this was ~25 years ago and I can't really confirm the details anymore.
In fact if you can't hear the difference between 24/192 and 16/44.1 you shouldn't be working in audio. (Doesn't apply to consumers. Does apply to musicians and engineers.)
It's like being colour blind.
And if you don't understand the math behind quantisation, you shouldn't be posting pseudo-scientific videos where you use an oscilloscope and a cheap spectrum analyser - both tools with very limited resolution - to "prove" your point.
16 bit isn't enough for hard, objective reasons. One is that the noise spectrum of quantisation is not simple. Most people assume it's something close to plain white noise, but it really isn't. It's actually a very complex spectrum with some prominent peaks at specific subdivisions of the sample rate. Those frequency peaks are significantly above audibility. 24-bit quantisation shrinks them below audibility.
The other is that most people can hear dither/noise-shaping at 16-bits. That adds a single bit of noise which should - if you're being very literal - be far below the threshold of audibility. But it clearly isn't.
These two facts are related.
The more complex reason is that listening is an active perceptual process. The brain does a huge amount of processing to separate sources and place them in a perceptual field which includes information about perceived object type, distance, and ambience cues. Some of those cues are very quiet, and we don't hear them linearly.
So using sine waves as some kind of perceptual reference for audibility is nonsensical. We hear much more complex signals in an active way, and if there's information missing in the quiet parts - which there is with limited quantisation - then the signal simply isn't accurate.
Small differences in gain are ABX able much more readily than differences in noise at the 16 vs 24 bit level. So if the signal chain gives even a small difference in gain between the samples that's what you'll track. A reasonable conversion path to 16 bits for mastering will also apply dithering and some kind of brickwall limiting (you have to limit after the dither or as part of the dither as dither can change levels!), and this can result in gain changes. The DAC may behave differently or have outright bugs for some configurations too.
This is particularly true wrt reconstruction filters for sample rate differences. And if you were comparing 44.1k and 192k then the physical DAC itself was likely running at a different rate and its _analog_ filters are probably better optimized for one vs the other (this is less true for 48k vs 192k, as the hardware likely runs at the same rate for both). So one answer to this comparison can be "on this particular hardware this rate is better than that rate"-- but that's a implementation property not a property of format choice.
You might think, "okay I'll use a mathematically perfect down and up conversion process and run the DAC in the exact same configuration for all cases". But even then you run into issues like after reconstruction the _inter sample_ peak levels will be higher than the levels of the samples, so you have to handle that and in a way that doesn't produce a gain difference between the two configurations. (probably by running your perfect process and finding the gain level that results in no limiting, then making the gain of the original match).
And then for the high rate vs non-high rate you have to deal with the fact that most amplifiers are not particularly linear (compared to well constructed software at least!) and that any real speaker is very far from linear. This means that the presence or absence of ultrasonics will change the audio in the 0-20khz band.. Before you think "well that could be a reason that high rate is better" observe that if there was some consistently good effect from the ultrasonics you could just bake it into the low rate sample.
> but in my 50s I know
Yeah if you're in your 50's you're absolutely not hearing differences way up above 20khz (especially if you're male), I bet you can't even hear CRT flybacks from 100 yards anymore. :P Most people have no idea how much their high frequency hearing degrades as they age because it plays approximately no role in your life, but it's real, dramatic, and as far as I know happens to everyone.
I don't mean to discount your experience: I don't really doubt that it was real. But answering the general question of the necessity of low vs high rate probably takes a team of experts, armed with test gear and the designs of the HW/SW in question, to vet the test configuration. Testing a _particular_ configuration without the ability to distinguish its implementation quirks from format-fundamentals is much easier and that's what most attempts to test this question are actually testing.
By testing in a recording studio you were doing far better than most such comparisons. Usually people try comparing different files and they're comparing entirely different mastering processes. Files made for the "high res" market will often have much less compression and limiting then files made for commercial radio play / casual listening... and truly do sound obviously much better. Some of my favorite recordings are rips from vinyl. Vinyl is an awful format from the perspective of audio fidelity, but it's also pretty intolerant of excessive compression and limiting because the record will skip if the needle is bouncing off the rails. And more recently I suppose they also avoid over compression there because of the difference in target listener/environment.
This was supposed to be running the DACs to match the source configuration, not resampling into some common format. I think that is an unavoidable part of the whole end-to-end ABX test concept.
Maybe it would be interesting to up-sample back into 24/192 and play both in that mode. But then people would argue about what type of up-sample to use.
I was in my mid 20s for this test. I understand my high-band hearing was better back then.
This was common knowledge at least as far back as the mid 80s, when every hifi shop and salesguy knew to ensure the bit of gear with the highest profit margin got played an almost imperceptible bit louder than the gear the customer came in to buy during back to back testing.
Point being: it doesn't even require an unscrupulous sales person to get similar results to an unscrupulous sales person! :P
You try to hear the brickwall by the muffled, enclosed quality and possibly by the weird pre-ring blurriness of the filter making things sound more vague than they have to be, and you hear the truncation not because it is audible 'distortion' as we know it, but because depth collapses and it sounds like it's coming from the speakers and not being a separate space behind/around the speakers. At no point will it be the most glaringly obvious thing but it'll never be 'distortions' as we imagine them, it's more a 'pod people' lack of personality thing.
Like a much subtler version of listening to AI music :)
I'm quite happy with 24/96 as suitable overkill for anything I might want to hear or do. Neil Young went hard on the proposition that 192 was necessary. Sold the Ponoplayer, I had one but it died on me, battery failed eventually. It really did sound awesome beyond just about any other listening device I've ever heard…
The last couple of generations of converters have gotten a lot better, so 192kHz today is likely to sound cleaner and smoother than it did ten years ago, where there was a good chance the clock was quite jittery.
Personally I don't think it's worth the extra bandwidth for playback, but I can understand why some people might want it.
Generally all of these "debates" come down to people who think math > circuitry. All real designs are imperfect trade-offs. They all have issues, and arguing as if converters are perfect when they never are, and the imperfections can be benched objectively, is... not very scientific.
High-dynamic-range material benefits from lots of bits.
A reasonable definition of transparency for high bitrate compressed audio is "Can the worst files be distinguished by a listener trained in what artifacts sound like". Maybe also add in having to use a high discrimination listening setup, including not running excessively loud (increases masking).
If that's not the test you're doing, it's unsurprising. At moderately high bitrates no one can reliably distinguish them on arbitrary samples: most inputs are easy.
If you test on known-difficult "killer samples" you'll probably easily distinguish them, even without first being shown what to look for, and certainly after.
During the development of Opus I created many 'trained listeners' and selected many killer samples, and I don't recall* ever encountering a tin ear that couldn't be taught to ABX any high rate samples, though some people are obviously much better at it.
I'm not sure I'd recommend it though: learning to identify artifacts has a frequent side effect of making low rate audio like the HE-aac used in SirusXM absolutely intolerable. I'm bothered by it even when I hear cars driving by using it. :)
[*] My memory for such things sucks, so I could be wrong-- but my point that it's not expected remains.
You're right it's just minor details.
The takeaway from these sorts of posts, at least in my opinion, should be two-fold:
* Understand the physical limits of human senses and perceptions to help inoculate yourself against outright scams and grifts
* Liberate you from the "tech grind" and allow you to enjoy what you like, how you like it.
Also understand that while there is an upper limit, we are all different within that. I can hear the difference between 128Kbps and FLAC, at least for some content, but not 256Kbps, maybe not 192. For some content (spoken word etc.), 64Kbps, sometimes less, is perfectly acceptable (to me). There was a time I could hear the difference between some encoders, but that was decades ago and anything in active use is pretty damn good (and my ears are not what they used to be) unless you really crank the bitrate down or tweak other options daftly.
You've established this with double bind testing, correct?
On a tangent, whenever someone mentions LP sounding warmer or whatever I like to point out that I prefer wax cylinders (a.k.a. phonograph cylinders).
But I also have a large multi-terabyte music collection, I follow new music, go to concerts, go to parties, talk about music with my friends in signal group chats.
It's a hobby, and when you get a bit older and start having some savings, if you love music treating yourself with a better system is not that crazy.
Also with HEDD you get a handcrafted device made in Berlin. And if you go with nicer cables, they are very beautifully done and feel great. There is no difference in sound of course. Some people like jewelry, I can get similar enjoyment from beautiful audio equipment and cables.
And so many CDs of course.
If I have an option to get a 16bit version of a recording or a high-res version, I choose the highest quality version very time
Same with a physical copy. A limited edition, better quality vinyl LP is more attractive if you are going through the trouble of curating a collection.
I’ve been curating a music library of digital files since before the iPod was released and I will always go for the highest quality version out of principle. I can always downsample it to any thing that makes sense.
It may be simultaneously true that:
A) Humans cannot tell the difference between 44.1kHz/16-bit audio and any higher resolution, and
B) For a particular song, the best commercially available 44.1kHz/16-bit version may not be the best commercially available version
"The quality of the particular mastering can still make a noticeable difference, regardless of the ability for the digital sampling rates to perfectly represent it perceptually"
Just to be clear that the statement applies to any releases meeting the A) criteria, not just 44.1 kHz @ 16-bit ones.
96kHz was created to better reproduce 20kHz high frequency, so the digital noise shaping filter does not need to be super sharp right at the Nyquist frequency.
Both were introduced for a sound technical reason. beyond that, most are marketing non-sense to cheat consumers.
I use microphones that can 'hear' up to 100kHz (Sanken CUX100K) and for film sound design playing 192kHz audio at half and quarter speed the results are very significant, and reveal there IS 'content' above human hearing. Irrelevant for general listening but very important for sound design.
Nobody uses 32 bit float for recording (to do so is just to capture at least 10 bits of noise, most of that being brownian); its strictly a format for mixing and processing. You don't get any more resolution from 32 bit floating point than you do from 24 bit integer formats, but the result of "clipping" is less dramatic, hence the appeal of the format.
While there is some evidence that non-auditory human sensory perception may be sensitive to ultrasonic acoustic waves, it's pretty weak right now, and somewhat in the "woo" zone. It may turn out to be significant, or it may not. I wouldn't base an audio production workflow that requires 4x the cpu power and 4x the disk space on such tentative claims, but you're welcome to.
I am extremely aware that as a data format in DAWs and other recorders, 32 bit floating point is completely common.
Surely you understand a recording made at 48kHz has a max freq response of 24kHz and played at half speed that max freq is 12kHz and at quarter speed only 6kHz. You can very clearly hear the filter cut off due to Nyquist. Record at 192kHz with mics capable of 100kHz capture and when played at quarter speed, the sound is full spectrum because there is no truncated frequency response. And when I load a 192kHz recording to izotope RX I can literallu see the harmonics going up to 96kHz. (not with every sound of course)
I repeat, i am not talking about 'normal' listening. I am talking about an industruy you have no knowledge or lived experience with, so spare me the incorrect claims about what can & cant be heard.
I'm the original/lead developer of Ardour, a cross-platform DAW, and have been working with digital audio for more than 25 years.
There are no 32 bit ADCs - your SD MixPre's are giving you (at best) 22 bits packaged as a 32 bit float value. The preamps make absolutely zero difference to the AD conversion (though they might sound real nice).
> Surely you understand a recording made at 48kHz has a max freq response of 24kHz and played at half speed that max freq is 12kHz
This is a very naive version of what "played at half speed" might actually mean. If properly and correctly resampled, this is not true.
> And when I load a 192kHz recording to izotope RX I can literallu see the harmonics going up to 96kHz
Well, I'd certainly hope so! But the question is: what are the energy levels associated with the partials above Nyquist? If you recorded at 384kHz with sensitive enough equipment, you'd see partials above 96kHz - but at extremely low energies because ... well, that's just how physics works.
[EDITED to remove AD/DA confusion]
The half speed you call naive is again just showing your ignorance. Sound editors have been using this technique since the days of recording on a Nagra at 15ips and literally replaying at 7.5ips half speed, and at 3.75ips for quarter speed. There is nothing naive about it, it is a very well know technique. To be able to achieve the same result digitally with full spectrum has impacted every feature film you have experienced in recent years. Again I speak from decades of lived experience.
My use of DAC was a thinko, I've edited at least post to correct it since in the current context we're always talking about ADC. Apologies for that.
You are arguing about techniques you have no experience with.
Yes they do, almost all high end field recorders used for film work are 32-bits now and have been for much of the last decade, often with some fancy preamp integration so that there is no expertise required for gain staging the recording. (I believe the implementations use a second matched 24bit ADC with 48 dB less gain in front of it).
The result obviously doesn't have a noise floor which is lower (as the noise of a room temperature _resistor_ gets in the way of that even at the 24-bit level) but they have more dynamic range so that your recording isn't ruined by hard clipping some unexpected loud sound.
It's a big improvement for practical usage, and also likely does improve SNR somewhat because you can run higher gains without as much fear that you'll ruin the recording. The reason it would pay off is that the SNR loss you get from splitting the signal is easily smaller than the SNR loss you would get from gain reduction to avoid clipping.
(maybe... capsule self noise is also limiting... at these levels, and usually people aren't using microphones designed for the lowest possible self noise unless they're doing something special)
There are ADCs that will provide 32 bits per sample but that's entirely different.
Current technology limits the bit depth to 18-22 bits and going beyond that you'd be very quickly recording brownian (atomic) noise anyway.
The point about 32 bit float is that it is a useful format for mixing, editing and general processing, so it is widely used in digital audio tools. But it is not a format that ADCs generate "natively" via their electronics - almost all of them are generate a 24 bit integer or fixed point value and then just supplying that as a 32 bit float value because the software asked for it (the software could have done it all by itself.
[EDITED: DAC->ADC since that is what I meant and what this is all about]
so maybe they do sample at 24 bit at a well chosen gain level and then convert to 32 bit float, with the max 24 bit value being above 1.0 float
or as GP said, use two separate ADCs at two different gains and combine their output
Of course it does! And that's what it does, of course. But that has absolutely nothing to do with the AD process itself, which is chip-limited to 24 bits and likely physics-limited to somewhat less than that.
You can't beat the physical limit of a DA circuit by doubling them up at different gains.
And .. you don't want to. Going beyond 22 bits gets you into brownian noise pretty quickly, which is completely pointless.
The best you can do (or could do) is get a very, very, very good DA that can really do 22 bits (likely not commercially available because of the expense), and then get the samples from it in whatever format works best for your purpose (24 bit integer, some fixed point value, or 32 bit floating point).
but what if you "allow" double that voltage and call it 2.0 float? a strong pressure into the microphone generates a stronger voltage
thermal noise limits you on the quiet signals, but not on the powerfull ones
so 22 bit for typical -1.0 -> 1.0 range and you can add a few more bits on top of that for stronger audio pressures (voltages) which you would traditionally clip
https://tascam.jp/int/feature/32-bit_float
You have some low noise amplifier. There is a signal. You split it. The result on each side has >=1 bit worse noise floor, probably somewhat worse as we're not using superconductors :P-- as you expect: there is no free lunch.
Now: take one copy and attenuate it 48dB, further degrading its noise floor. Sample both. The attenuated copy is mostly useless, except when the input goes high enough that it would have hard clipped the other ADC.
So the tradeoff is that you lose a small amount of noise floor constantly-- out at the 20th bit, that you probably didn't care about (microphone self-noise is limiting you out there anyways at normal volume levels), in exchange for never clipping.
To turn this into a better ADC generally, you'd need the splitting stage to not hurt the noise floor, but it does.
The reason it's not the same as just lowering the gain so that you won't ever clip is that to get the same dynamic range you'd have to lower it by 48dB and now your ADC doesn't achieve its potential for typical signals. You could lower the gain by 3dB (or whatever the splitting cost you) and get the same results for the low gain signal and a little more headroom, but you would not get the massive headroom increase of this approach.
For this to work one must also have amplifiers with much wider dynamic range and SNR than ADCs, but we do.
The natural output for this approach is a float-- the most natural would be a weird float where instead of an exponent one bit tells you which ADC is in use and represents a factor of 256 or whatever, but in practice these recorders just output 32-bit floats. I haven't looked but I wouldn't be surprised if there were only two exponent values ever used in their output.
So, basically, no better than the best AD converters we already have?
My understanding of the fundamental limit to AD performance is that the brownian noise level is around the 22nd bit level. So even if you come up with techniques to successfully measure down to that level, you're basically picking up .. inevitable, irremovable, irrelevant noise.
Possibly there are gains to be made by not worrying about the noise floor and caring more about the lack of clipping, but I'm not seeing people screaming about that. The "noise" seems to be "N bits of dynamic range", not "slightly less dynamic range but it will never clip!"
A common experience for someone doing field recording of performers (my experience is music) is you twiddle your setup to get the gains reasonably high to get good SNR even for quiet parts. ... and then you record the actual performance, and you find that the tuba player really got into it for the real performance and the new peaks are 10dB over where they were in the practice. And now your recording is screwed up with a bunch of hard clipping you have to deal with. So then experience tells you in the future to take whatever you thought was safe and lower gains another dozen db.
The multi-ranged recorders eliminate that problem and the result is that you don't need to use precautionary gains, and you get a better SNR in your recordings. You probably don't need to adjust gains at all: The gain can be whatever makes the self-noise of the microphone dominate the SNR of the process, ... which would be too high for the loudest samples, but the clipping handling deals with that.
The samples that need to use the extended range have worse SNR (and probably poor linearity due to mismatches between the converters), but human hearing is much less critical to noise with loud signals anyways.
> Nobody uses 32 bit float for recording (to do so is just to capture at least 10 bits of noise, most of that being brownian);
This is not true and not true for a good and important reason! One which has no bearing on the kind of DACs that exist.
Modern field recorders allow gains set a 'reasonable' level that maximizes SNR for recordings but still won't clip when there are much louder peaks. Not so dissimilar to how a 6-digit multimeter can achieve its advertised performance both on a 0-5v range and a 0-300v range but cannot give more than 6 digits at the higher range.
Obviously, everyone and their mother uses 32 bit float as an internal sample format because of its fitness for purpose (except the folks who think they need 64 or 80 bit floating point, of course). But they are not using "32 bit floating point samples" - the samples come from an (at best) 18-22 bit integer conversion.
It’s like having gigabit internet to my house: I don’t actually need it, but when a website is slow, I know the problem isn’t in my internet connection.
https://www.carwow.co.uk/blog/carwow-quarter-mile-400-metre-...
https://en.wikipedia.org/wiki/List_of_N%C3%BCrburgring_Nords...
I opened a support ticket but they never responded. After that it was difficult to take their lossless claims seriously when the labels were providing such garbage source material. Their whole value prop was totally hollowed out.
I don't know whether the labels still impose such horrible practices, but I largely gave up on streaming services after that experience and now focus on keeping good digital archives of my physical library.
I’ve played with the nice toys, and they are nice, but for 100x the price, they barely deliver 1.5x the experience.
[1]: https://www.cnn.com/2023/03/30/world/plants-make-sounds-scn
https://video.xiph.org/vid2.shtml
or on YT if you can't play it https://www.youtube.com/watch?v=cIQ9IXSUzuM
It kind of changed me a bit when I ran through 20 lossless tracks I had re-encoded to various mp3 bitrates and realized that even on a fancy system, it can be really hard if not impossible to discern even moderate lossy from lossless.
If you are an audiophile geek, really think about if you want to try this, the reality check might crack your foundations.
[1]https://www.foobar2000.org/components/view/foo_abx
We store files in the highest quality because it gives us the option to encode the music without audible loss of quality.
OTOH, we know nothing of your audio equipment nor how its setup.
EDIT: Did some more ABX testing with a CD-quality track that I'm much more familiar with ('Introduction' from the Mirrors Edge soundtrack, which has been my go-to for comparing audio gear for the last decade). I could sometimes distinguish 128k mp3 this time, though interestingly, I got it consistently wrong rather than right. For some reason the compressed version seems to be my preference. Dropping to 96k mp3, I got it right 100% of the time - though only because there was a very noticeable difference in the stereo positioning of the first sound, rather than a difference in the quality of the sound. I think if it were mono I would still be unable to tell.
In a nutshell: nullc, rahimnathwani, zamadatix and vor_ know their shit, and geraldmcboing and PaulDavis are technicially correct but talking past each other. speak_on and TheOtherHobbes are confidently wrong.
And also: 44.1 kHz captures the entire human audible spectrum with room to spare, and 16-bit already goes beyond anything useful for listening. The higher resolution / sample size format is useful for production or archival purposes only.
The two main reasons why you hear a difference between the two formats: (1) it's likely a different master, (2) tiny gain differences in the signal (salesmen use this trick, but it's also easy to do it by mistake).
It’s like photographers who are confused about the difference between raw and bitmap (jpeg), videographers confused about the difference between linear raw vs log vs gamma encoded, etc.
Just because a data format with higher bit depth/sampling frequency/whatever exists for editing purposes, doesn’t mean it’s “better” or makes sense as a consumption format for a finished work.
Forms of manipulation bring inaudible content into the audible range.
Of course that doesn't mean audiophiles aren't being audiofooled by it, but there is legitimate usage.
However, the article claims that the final distribution doesn’t need to have a bit depth of more than 16. That does not match my experience. I can tell the difference between my renders that are 16 bit vs 24 bit. I cannot tell the difference between 44.1 kHz and higher sample rates, and that’s consistent with the math (Nyquist-Shannon), but bit depth is a different matter. Would be fun to participate in a double-blind test that includes my own tracks and others.
established using double blind testing, I assume?
(2014) https://news.ycombinator.com/item?id=8689231 424 comments
(2015) https://news.ycombinator.com/item?id=10520639 228 comments
(2017) https://news.ycombinator.com/item?id=15127633 428 comments
(2019) https://news.ycombinator.com/item?id=19318898 314 comments
And its all good! It's perfectly fine to say "I prefer the sound when the whole mix (or just that guitar) ends up being subject to interesting and possibly harmonically relevant distortion at low levels".
Just don't say "The version with the distortion is more accurate than the one without", because that's a lie.
You can the focus on other things.
Example: I Bought the best skis possible. Now I know I need to just focus on my skills and not blame the equipment.
As for how this relates to audio compression, in particular in the context of 2012. you are making a tradeoff of storage size and decompression cost. Maybe that doesn't matter to you, but maybe it either did in 2012 or still does.
The problem is the people spreading myths and disinformation out of ignorance or to promote their enterprise.
The weak links are producers/mastering-engineers, speakers/headphones and the room when using speakers.
I use a DAC by focusrite which can do 24-bit, and if I want to listen to higher fidelity audio on my planer headphones then I should be able to. Why should I limit myself to 16-bit
If I like an artist that I find on streaming, I buy an LP and get a lossless download for free. I still have a music library and I will never rent my favorite music.
Artists prefer to connect directly with their fans and BC is probably the best platform for people who care to pay and support acts directly. They have high res downloads and I import them.
Also the playback rate and the file rate are different topics. The former can get into scenarios more like the audio processing section of the article e.g. I had this one shitty headset for work which required me to set the volume to 1-2 (out of 100) on the computer and I could actually blind test tell when it was in 16 bit or 24 bit mode because it was cutting and boosting it so much it effectively lost precision in 16 bit mode.
There's multiple YouTube channels that I listen to as podcasts, that are professionally created and the creators presume that exported audio works like studio audio, so what you end up with is really quiet audio that can't be turned up without pre-processing.
If we distributed audio the same way we work with it in a studio, we could forgo a lot of problems.
Also, the human ear does have enough dynamic range to make 24 bits worthwhile, though that much dynamic range is rarely used in recordings, and that high of a bit depth provides no benefits within a small dynamic range. A 192 kHz sample rate, on the other hand, is always useless.
I can always tell if my 44.1 songs are being resampled to 48 because they're being run through the OS mixer
But a quality audio player should account for this and do it's own.
It is an incredible resource to see the quality of the resampling algorithms used by the actual production software likely used in any digital audio workflow.
You will see that while the best are indeed almost 100% transparent, many are not.
There is also https://src.hydrogenaudio.org/ (with no IP based restrictions, AFAIK).
your software is among the best, but not pitch black best :)
It would have cost the same for the entire stack to be 16bit/44.1kHz at every step, but with excessive resolution I can control the volume anywhere. The bits right before the analog conversion at the end are essentially the same whether I turn down the volume in the software player, the operating system, or the DAC/amplifier.
When I play from the computer, I'm not sure whether it is using the clock on my Mac, the clock on the optical interface, or the WiiM's clock. However, I do not notice any difference in fidelity when I use the Qobuz software player on my Mac or use Qobuz Connect to allow the player to directly stream from the source, so either it isn't a difference that I can hear, or the WiiM's internal clock is used for both sources.
Also, sadly consumers are getting used to low quality audio nowadays - they often listen to lossly compressed audio on social media (sometimes decompressed and re-compressed several times) which is then re-compressed to send to bluetooth headphones, or played back on an awful smartphone speakers. Streaming services also use compressed audio.
I'm not interested in finetuning everything in my life for efficiency.
Now in terms of realistic audio encoding, 16 bit at 44.1 kHz is designed to be a faithful representation as far as human hearing is concerned. Can someone with a trained ear potentially tell the difference between that and 24 bit at 192 kHz? In a studio environment it's possible. Most audiophile claims are dubious and a blind A/B test catches them out on most of it but the Nyquist-Shannon sampling theorem does not directly apply to quantized samples, it's about exact samples and with quantization, sampling rate is intertwined somewhat with the quantization depth.
Would be happy to see an actual, real study to prove that humans can notice, but to my knowledge none exist that confirm they can. Not even any on teenagers or younger (the only group that can even hear close up 20khz).
A quick search returned this PDF with a nice diagram of what aliasing looks like: https://download.tek.com/document/76W_30631_0_HR_Letter.pdf
To draw a design parallel: pixel-perfect design isn't something we are born with, noticing tiny details is a developed skill.
And yes, you are on point: oversampling is used extensively, but this just points at the exact issue: Nyquist theorem gave us a math algorithm, we still need to account for the electronic component imperfections. And then we are entering a different space of quality/precision/psychoacoustics/perception/etc. Meaning, not all converters, not all pre-amps, not all mics "sound" the same, even when they use same types of components on paper.
Do you have more convincing sources?
The human threshold-of-hearing curve intersects the threshold-of-pain curve at about 20 kHz.
Above that frequency (or thereabouts) the sound has to be so loud that it will literally instantly damage your hearing before you can hear it.
This has been replicated across many studies for more than 100 years.
Flicker threshold is completely different. You can’t damage your vision by increasing the FPS, and it has always been commercially desirable to use a lower frequency because that is cheaper.
In addition, nobody cares about "measurable" artifacts (or rather, they should not). What matters are "audible" artifacts. We have measuring equipment that is vastly more sensitive than human ears (e.g. your recording equipment that can pick up signals far above 22kHz). What's measurable is not particularly interesting - what's audible is.
Artifacts do not sum linearly, because they do not originate from correlated sources (unless you're doing something rather unusual).
Glad you can hear the difference between two converters, but I trust you've tested it in a double blind setting?
And absolutely - I blind tested coverters extensively. Mbox2, Black Lion Audio upgraded converters, UA, Prism.
Yes, the discussion was "never about analog vs AD". But my point is that I see little point wasting time on one set of artifacts (in the digital realm) that are tiny compared to those introduced in the analog realm. If there's a mouse and an elephant about to enter your home, you focus on the elephant, no?
The big difference, of course, is that "everyone" has convinced themselves that most/all of the analog artifacts, as big as they are, are somehow "tasteful" or "artistic", whereas the digital ones are just "math errors". I don't think is too helpful.
And look, if lots of people could get through double blind tests and still show they can hear aliasing or whatever the digital artifact du jour is, then I'd say "yes, absolutely, we need to be very aware of this and do everything we can to reduce or eliminate it". But as far as I can tell, this just isn't the case.
To your main point: yes, all artifacts are just our learned, cultural, developed preferences. In the exact same way major/minor thirds were considered dissonant just a few hundred years ago - it's all a learned perception, not an absolute judgment.
I would go even further, doesn't matter whether people perceive aliasing as a major issue, it's no different from the U47 "warmth". You can't afford this, probably, as a software developer in a way, but at the most fundamental level any sound's - or artifact's - judgment is based on our our current diagram of "sounds nice" vs "sounds bad".
Who has the best ears? What can they detect?
I know from my 20-ish year mixing experience that I can hear the difference when mixing. Is it good evidence? No. So we can agree to disagree then.
https://ardour.org/ is my website.
Firstly, it's an amazing experience to randomly interact with people like you - I love and use your software. Hats off and thanks for what you offered to the industry!
But secondly, your statement makes even less sense to me: obviously artifacts do add up. Yes, not linearly, like any complex audio in general. But the more tracks with artifacts I have, the more artifacts I have overall. It's not like they cancel each other (outside of normal frequency cancellation).
The energy of the signal components above the Nyquist is generally very low, and very few double blind tests have given any indication that humans can detect the resulting aliasing (even though many people claim to be able to do, almost always in non-double-blind environments).
Badly written digital synthesis can generate high energy signal components above 22kHz, but that's because they're badly written, not because the theory is wrong.
This space is not driven by a single precise formula. 48/96 kHz helps some engineers to produce better sounding mixes. Can everyone hear the extended range of Adam tweeters? Probably not. But some can, and they benefit from that. Even if there is no double-blind study to prove this in absolute terms.
But very little music is like that, and the energy profile above Nyquist will differ dramatically. Consequently, you're not summing a set of identical aliasing results, and in general, the results will still be undetectable to almost everyone.
Jacob Collier routinely works with 300+ tracks in Logic. He doesn't worry about this sort of thing, and neither do the Grammy voters who love what he does.
It is always amazing how much that is claimed about what people can hear fails to show up when tested in this, the only acceptable scientific way.
Perhaps Maserati has done this, and could still tell the difference. In which case, he should carry on! But he should carry on anyway! People should do what brings them joy, and if he likes working at 44.1kHz or whatever, he should absolutely do that.
What people should not do is lecture about stuff that isn't true and/or isn't demonstrable in proper test settings, and most (not all, but most) of the SR stuff fits into one or other or both of those categories.
My favourite: "audiophile-grade" audio players which allocate a single continuous buffer of RAM into which they load/decode the whole .WAV/.FLAC file, because supposedly the CPU "jumping" between "fragmented audio" causes audible "jitter".
Of course, they don't know that what looks like continuous memory to user-code is probably discontinuous in kernel/physical RAM.
Didn't check in many years, I wonder if they created kernel level players to account for that, to have "true continuous memory"
[1] https://www.audioasylum.com/messages/pcaudio/119979/
How this would occur without also producing grossly audible pitch distortion never seems to be discussed.
Thanks for the laugh... this is absolutely bonkers. In case anyone is wondering, before sound hits our ears it has to go through a digital to analog conversion, which takes place on hardware independent of the CPU, operating with its own clock and buffers etc.
It gobbled like 90% of the CPU and I had to make sure I gave it a pretty large buffer so it didn't stutter when an app claimed CPU for more than a second, but it worked.
Also my headphones are extremely sensitive. I can touch the ring and sleeve of a jack with a finger, and touch a metal bed frame with a tip and I hear quiet clicks as I move the tip along the metal. Sometimes I do not even need to touch the jack with a finger. It doesn't work with small objects like a knife though.
So I guess the programmer equivalent is distributing .pdb's (or, symbols)
Most modern digital synths have already caught onto this and run internally at much higher sampling rates even if their output gets downsampled, but sometimes you run across a vintage plugin that runs at the host audio rate and working in a higher sampling rate is audible.
Oversampling gives you headroom for aliases for the rest of the synth that is more vulnerable to it.
* Some people are still making this mistake, despite information on the (many) ways to do it the right way being widely and freely available!
There's some ways to do band-limited distortion but...they aren't nearly as widespread, easy, or universal as band-limited oscillators.
Ring modulation is funny though because you'd ideally want the sidebands to modulate down by default rather than filter them out, that's why you're using it.
So if your synthesizers do not use proper band-limited oscillators then 192KHz is _FAR_ too slow. You'd want to be running at hundreds of KHz, perhaps a few MHz.
In reality synth software that doesn't sound like crap uses band limited oscillators and should work okay at 48KHz too. That said, even if the oscillators are band limited it may be the case the varrious modulations aren't band limited properly, as getting those wrong won't sound instantly wrong (in particular because you have to modulate to make it wrong, and the underlying change of the modulation may make it harder to tell its wrong).
Though also in those cases if you're not counting on every step being properly band limited then 192KHz may be an improvement but you're still probably getting some meaningful aliasing. I think given how fast computers have become relative to digital audio there is probably a good case to just make any "modular synth" run at 32-bit 480KHz or even 4.8MHz through every stage that could process the audio.
Maybe 192KHz really is enough to suppress the aliasing artifacts but I think to be convinced of that I'd want to see a system that supported both and validate that the difference between a downsampled 48KHz output from the two modes was below -90dB or something.
Or otherwise you can just declare that the aliasing is part of the sound and then there are no right choices... 24khz sampling, 48k, 192k ... who cares, use what you like best. :)
192 for mixing and mastering can be useful especially if you're doing a lot of effects, especially anything that pitch shifts. But I've seen low quality phone-microphone recordings make it to the master; if you capture lightning in a bottle, it hardly matters what the settings were, what the microphone was, or anything else.
Some previous discussions:
2023 https://news.ycombinator.com/item?id=34698427
2022 https://news.ycombinator.com/item?id=30138561
2019 https://news.ycombinator.com/item?id=19318898
2017 https://news.ycombinator.com/item?id=15127633
2015 https://news.ycombinator.com/item?id=10520639
2014 https://news.ycombinator.com/item?id=8689231
2012 https://news.ycombinator.com/item?id=3668310
To try to imagine something similar: the human eye is unable to see UV light, yet fluorescent paint has a visible quality of its own compared to "normal" pigments.
this has practical applications
It's a bit redundant for a skilled technician, they're already used to setting the gain staging, inbound compression, and feathering the mics to avoid this in 24-bit, but if you're handing a boom mic to a novice and have a scene where e.g. someone's whispering and another person's screaming, it can be nice to not have to worry about it.
Don't forget to buy the new low oxygen platinum plated HDMI cables for the better experience!
/s