ETRI Knowledge Sharing Platform : Audio Coding for Machines: Saliency-Guided Masking for Concentrated Compression

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Audio Coding for Machines: Saliency-Guided Masking for Concentrated Compression

Cited - time in scopus

Abstract: Audio codecs today are built around the human auditory system, but the spectro-temporal cues they discard as inaudible often carry weight for machine tasks. This paper revisits Audio Coding for Machines (ACoM) through the lens of saliency-guided pre-compression masking. Building on prior work, we contribute two additional analyses on which factors contribute to optimal masking ratios in audio, and a step-wise view of how masking affects machine task when low-saliency bins are progressively removed. Together, they show that the optimal masking level is strongly sample-dependent and that predicting it is a non-trivial learning problem left to future work. Even so, simply suppressing low-saliency regions before encoding already recovers a substantial share of the compression-induced performance gap, pointing to a promising direction.

KSP Keywords: Audio coding, Dependent and, Human Auditory System, Spectro-Temporal, performance gap, pre-compression

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.