ETRI Knowledge Sharing Platform : Representation Learning for Background Music Identification in Television Shows

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Representation Learning for Background Music Identification in Television Shows

Cited 1 time in scopus

Citation: International Conference on Information and Communication Technology Convergence (ICTC) 2019, pp.1434-1437

Abstract: Although audio fingerprinting has been widely used in various applications, the performances of audio fingerprinting methods are extremely decreased in case of identifying the background music mixed with speech in TV shows. To solve this, we present an approach to represent embeddings for background music identification using deep convolutional networks. We construct triplet dataset including the original songs, the same songs mixed with voices, and different songs. Then, we train the network with triplet loss function with adaptive margin. By nearest neighbor classifier, the closest embedding is found among the ones of original songs. As comparing top-1 accuracy of music identification, it is shown that our representation learning of the embedding from each music segment mixed with speech has meaningful information for music identification.

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.