ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Representation Learning for Background Music Identification in Television Shows
Cited 0 time in scopus Download 3 time Share share facebook twitter linkedin kakaostory
김혜미, 김정현, 박지현, 유원영
International Conference on Information and Communication Technology Convergence (ICTC) 2019, pp.1434-1437
19KS1100, 음악 및 동영상 모니터링을 위한 지능형 마이크로 식별 기술 개발, 박지현
Although audio fingerprinting has been widely used in various applications, the performances of audio fingerprinting methods are extremely decreased in case of identifying the background music mixed with speech in TV shows. To solve this, we present an approach to represent embeddings for background music identification using deep convolutional networks. We construct triplet dataset including the original songs, the same songs mixed with voices, and different songs. Then, we train the network with triplet loss function with adaptive margin. By nearest neighbor classifier, the closest embedding is found among the ones of original songs. As comparing top-1 accuracy of music identification, it is shown that our representation learning of the embedding from each music segment mixed with speech has meaningful information for music identification.
KSP 제안 키워드
Adaptive Margin, Audio fingerprinting, Background music, Deep Convolutional Networks, Meaningful information, Music identification, Nearest Neighbor Classifier, Representation learning, TV shows, nearest neighbor(NN), triplet loss function