ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper CARE-VL: A Domain-Specialized Vision-Language Model for Early ASD Screening
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Cheol-Hwan Yoo, Jang-Hee Yoo, Jaeyoon Jang
Issue Date
2025-09
Citation
International Conference on Medical Image Computing and Computer Assisted Interventions (MICCAI) 2025 (LNCS 15964), pp.57-66
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1007/978-3-032-04971-1_6
Abstract
We propose an autism spectrum disorder (ASD) screening framework that integrates an expert vision-language model (VLM), CARE-VL, with a large language model (LLM)-based aggregation module to assess children’s social interactions and derive subject-level ASD/typical development (TD) classifications. Our framework processes video data collected using social interaction-inducing content, where medical experts annotated predefined query-response (Q-R) intervals based on key social indicators—such as response to name, eye contact, imitation behavior, social smiling, and pointing—by marking correct responses and assigning subject-level ASD/TD classifications. To adapt the general-purpose VLM to the ASD screening domain, we constructed a synthetic instruction-tuning dataset using a label-guided reasoning method on these clinical tags, fine-tuning the model to generate detailed captions and multiple-choice question-answer (MC-QA) pairs, capturing children’s critical social behaviors. CARE-VL processes Q-R intervals to produce clip-level MC-QA results and descriptive captions, which are then aggregated by an LLM to derive final ASD/TD classification and clinical reasoning. Our end-to-end framework combines visual understanding and linguistic reasoning, achieving 84.6% accuracy for clip-level response prediction and 75.8% accuracy for subject-level ASD/TD classification. These results demonstrate the potential of our framework as a practical and interpretable tool for early ASD screening and behavioral assessment. The code is publicly available at https://github.com/etri/AI4ASD.
KSP Keywords
Data collected, End to End(E2E), Fine-tuning, Q-r, Reasoning method, Response prediction, Social behavior, Social indicators, Video data, autism spectrum disorder, clinical reasoning