AI News Hub Logo

AI News Hub

I Built a macOS App to Detect Fake Lossless Audio

DEV Community
Ale

A WAV file is not always what it claims to be. This is called fake lossless audio. It's common enough in DJ libraries and audiophile collections that I built a macOS app — Spectro — to detect it automatically. This post is about how the detection works. Why lossy compression is detectable When someone converts a 128kbps MP3 to WAV, the container changes but the frequency content doesn't. The high frequencies that were removed during MP3 encoding are gone permanently. The resulting WAV file carries a characteristic signature: a hard cutoff in the frequency spectrum — a flat, dark region above the cutoff frequency where audio energy simply doesn't exist. Detecting it with FFT In Swift, Apple's vDSP framework makes FFT efficient and straightforward. Here's a simplified version of the core analysis: func computeSpectrum(samples: [Float], fftSize: Int = 4096) -> [Float] { let log2n = vDSP_Length(log2(Float(fftSize))) guard let fftSetup = vDSP_create_fftsetup(log2n, FFTRadix(kFFTRadix2)) else { return [] } defer { vDSP_destroy_fftsetup(fftSetup) } var real = [Float](samples.prefix(fftSize)) var imag = [Float](repeating: 0, count: fftSize) real.withUnsafeMutableBufferPointer { realPtr in imag.withUnsafeMutableBufferPointer { imagPtr in var splitComplex = DSPSplitComplex( realp: realPtr.baseAddress!, imagp: imagPtr.baseAddress! ) vDSP_fft_zip(fftSetup, &splitComplex, 1, log2n, FFTDirection(FFT_FORWARD)) // Compute magnitude spectrum var magnitudes = [Float](repeating: 0, count: fftSize / 2) vDSP_zvmags(&splitComplex, 1, &magnitudes, 1, vDSP_Length(fftSize / 2)) real = magnitudes } } return real } Running this across multiple frames of the audio (not just one window) and averaging gives a stable picture of the frequency content across the whole track. From this averaged spectrum, finding the cutoff frequency is a matter of locating where the high-frequency energy drops below a threshold. The verdict engine `swiftstruct VerdictEngine { func evaluate( cutoffKHz: Float, fileExtension: String, declaredKbps: Int? = nil, sampleRate: Float? = nil ) -> Verdict { let ext = fileExtension.lowercased() let isLossyContainer = (ext == "mp3" || ext == "m4a") let isLosslessContainer = (ext == "wav" || ext == "aiff" || ext == "flac") // Low declared bitrate is a strong fake signal if let declaredKbps, declaredKbps = 20 { verdictFromCutoff = .highQuality } else { verdictFromCutoff = .medium } // Lossy containers (MP3, M4A) can never be lossless regardless of cutoff if isLossyContainer && verdictFromCutoff == .highQuality { return .medium } return verdictFromCutoff } }` The three verdicts: LOSSLESS (highQuality) — frequency content extends naturally to the Nyquist limit. The file is what it claims to be. FAKE — hard cutoff detected below 16 kHz, or declared bitrate ≤ 160kbps. The file is almost certainly a transcoded lossy source. MEDIUM — cutoff between 16–20 kHz, or a lossy container with high-frequency content. Could be 320kbps MP3, a heavily mastered track with intentional high-cut, or a vinyl rip. Borderline cases that need judgment. The edge cases that matter Non-standard sample rates. A FLAC file recorded at 32 kHz has a Nyquist frequency of 16 kHz — not 22 kHz. Without the sample rate guard in the verdict engine, every 32 kHz lossless file would be classified as FAKE because the cutoff appears to be at 16 kHz. The fix is to compare the measured cutoff against the expected Nyquist for the file's actual sample rate, not a hardcoded 22 kHz assumption. 320kbps MP3 as WAV. A 320kbps MP3 converted to WAV cuts off at approximately 20 kHz — barely below the 22 kHz Nyquist. On a spectrogram this is genuinely hard to distinguish from a real lossless file by eye. The verdict engine flags these as MEDIUM rather than LOSSLESS, which is conservative but accurate: the file is not technically lossless. Intentional mastering cutoffs. Some mastering engineers apply a brick-wall EQ high-cut at 20 kHz intentionally. A genuine lossless file can show a hard cutoff at 20 kHz and still be truly lossless. This is why the MEDIUM verdict exists — to avoid false positives on edge cases where the spectral evidence is ambiguous. Audio optimizers. Tools like Platinum Notes process audio and modify its frequency content, which can cause a legitimately lossless file to be misclassified. This is a known limitation and worth documenting for users. The app https://getspectro.app The fake lossless problem is more common than most people realize — any DJ who has audited their library has found them. The detection is technically straightforward once you know what to look for. The hard part, as usual, was the edge cases. If you have questions about the FFT implementation or the verdict logic, happy to discuss in the comments.