ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술지 Moving Metadata from Ad Hoc Files to Database Tables for Robust, Highly Available, and Scalable HDFS
Cited 16 time in scopus Download 16 time Share share facebook twitter linkedin kakaostory
원희선, 차우, 길명선, 문양세, 황규영
Journal of Supercomputing, v.73 no.6, pp.2657-2681
16MH1700, (통합)스마트 네트워킹 핵심 기술 개발, 양선희
As a representative large-scale data management technology, Apache Hadoop is an open-source framework for processing a variety of data such as SNS, medical, weather, and IoT data. Hadoop largely consists of HDFS, MapReduce, and YARN. Among them, we focus on improving the HDFS metadata management scheme responsible for storing and managing big data. We note that the current HDFS incurs many problems in system utilization due to its file-based metadata management. To solve these problems, we propose a novel metadata management scheme based on RDBMS for improving the functional aspects of HDFS. Through analysis of the latest HDFS, we first present five problems caused by its metadata management and derive three requirements of robustness, availability, and scalability for resolving these problems. We then design an overall architecture of the advanced HDFS, A-HDFS, which satisfies these requirements. In particular, we define functional modules according to HDFS operations and also present the detailed design strategy for adding or modifying the individual components in the corresponding modules. Finally, through implementation of the proposed A-HDFS, we validate its correctness by experimental evaluation and also show that A-HDFS satisfies all the requirements. The proposed A-HDFS significantly enhances the HDFS metadata management scheme and, as a result, ensures that the entire system improves its stability, availability, and scalability. Thus, we can exploit the improved distributed file system based on A-HDFS for various fields and, in addition, we can expect more applications to be actively developed.
KSP 제안 키워드
Ad hoc, Apache Hadoop, Big Data, Distributed File system, Functional aspects, Open source, System utilization, database tables, design strategy, detailed design, experimental evaluation