ETRI Knowledge Sharing Platform : Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article Automatic Depression Detection Using Smartphone-Based Text-Dependent Speech Signals: Deep Convolutional Neural Network Approach

Cited 46 time in scopus

Download 229 time Share share

Authors: Ah Young Kim, Eun Hye Jang, Seung-Hwan Lee, Kwang-Yeon Choi, Jeon Gue Park, Hyun-Chool Shin

Issue Date: 2023-01

Citation: Journal of Medical Internet Research, v.25, pp.1-17

ISSN: 1438-8871

Publisher: JMIR PUBLICATIONS, INC

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.2196/34474

Abstract: Background: Automatic diagnosis of depression based on speech can complement mental health treatment methods in the future. Previous studies have reported that acoustic properties can be used to identify depression. However, few studies have attempted a large-scale differential diagnosis of patients with depressive disorders using acoustic characteristics of non-English speakers. Objective: This study proposes a framework for automatic depression detection using large-scale acoustic characteristics based on the Korean language. Methods: We recruited 153 patients who met the criteria for major depressive disorder and 165 healthy controls without current or past mental illness. Participants' voices were recorded on a smartphone while performing the task of reading predefined text-based sentences. Three approaches were evaluated and compared to detect depression using data sets with text-dependent read speech tasks: conventional machine learning models based on acoustic features, a proposed model that trains and classifies log-Mel spectrograms by applying a deep convolutional neural network (CNN) with a relatively small number of parameters, and models that train and classify log-Mel spectrograms by applying well-known pretrained networks. Results: The acoustic characteristics of the predefined text-based sentence reading automatically detected depression using the proposed CNN model. The highest accuracy achieved with the proposed CNN on the speech data was 78.14%. Our results show that the deep-learned acoustic characteristics lead to better performance than those obtained using the conventional approach and pretrained models. Conclusions: Checking the mood of patients with major depressive disorder and detecting the consistency of objective descriptions are very important research topics. This study suggests that the analysis of speech data recorded while reading text-dependent sentences could help predict depression status automatically by capturing the characteristics of depression. Our method is smartphone based, is easily accessible, and can contribute to the automatic identification of depressive states.

KSP Keywords: Acoustic characteristics, Acoustic properties, Automatic diagnosis, CNN model, Convolution neural network(CNN), Data sets, Deep convolutional neural networks, Depression detection, Healthy controls, Korean language, Major depressive disorder(MDD)

This work is distributed under the term of Creative Commons License (CCL)
(CC BY)

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI