ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Effects of Hyper-Parameters for Deep Reinforcement Learning in Robotic Motion Mimicry: A Preliminary Study
Cited 3 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Taewoo Kim, Joo-Haeng Lee
Issue Date
2019-06
Citation
International Conference on Ubiquitous Robots (UR) 2019, pp.228-235
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/URAI.2019.8768564
Abstract
When applying deep reinforcement learning to the motion mimicry problem between teacher and student robots, this paper reports the initial results of how various hyper-parameter configurations affect performance of learning processes and quality of generated motions. The hyperparameters considered in this study include the structure of policies such as convolutional and fully connected networks, the type of activation functions such as ReLU and hyperbolic tangent, and the number of input sequences such as one, four and eight. Under these deep neural network configurations, PPO reinforcement learning algorithm has been applied for learning. In the simulator environment, the teacher NAO robot demonstrates a target action repeatedly, and the learner NAO robot tries to learn that action. The target actions include handshaking and two-arm raising. Our experimental results show that fully connected networks outperform the convolutional counterparts both in training statistics and motion quality. For activation functions, however, we found an interesting mismatch between training and evaluation quality: for example, a configuration with higher rewards does not guarantee less motion discrepancy, which may suggest a new research direction to design better loss and reward functions for robotic motion mimicry.
KSP Keywords
Activation function, Deep neural network(DNN), Deep reinforcement learning, Hyperbolic tangent, Nao robot, Network configuration, Preliminary study, Reinforcement Learning(RL), Reinforcement learning algorithm, Research direction, Training and evaluation