ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Moving Metadata from Ad Hoc Files to Database Tables for Robust, Highly Available, and Scalable HDFS
Cited 19 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Heesun Won, Minh Chau Nguyen, Myeong-Seon Gil, Yang-Sae Moon, Kyu-Young Whang
Issue Date
2017-06
Citation
Journal of Supercomputing, v.73, no.6, pp.2657-2681
ISSN
0920-8542
Publisher
Springer
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1007/s11227-016-1949-7
Abstract
As a representative large-scale data management technology, Apache Hadoop is an open-source framework for processing a variety of data such as SNS, medical, weather, and IoT data. Hadoop largely consists of HDFS, MapReduce, and YARN. Among them, we focus on improving the HDFS metadata management scheme responsible for storing and managing big data. We note that the current HDFS incurs many problems in system utilization due to its file-based metadata management. To solve these problems, we propose a novel metadata management scheme based on RDBMS for improving the functional aspects of HDFS. Through analysis of the latest HDFS, we first present five problems caused by its metadata management and derive three requirements of robustness, availability, and scalability for resolving these problems. We then design an overall architecture of the advanced HDFS, A-HDFS, which satisfies these requirements. In particular, we define functional modules according to HDFS operations and also present the detailed design strategy for adding or modifying the individual components in the corresponding modules. Finally, through implementation of the proposed A-HDFS, we validate its correctness by experimental evaluation and also show that A-HDFS satisfies all the requirements. The proposed A-HDFS significantly enhances the HDFS metadata management scheme and, as a result, ensures that the entire system improves its stability, availability, and scalability. Thus, we can exploit the improved distributed file system based on A-HDFS for various fields and, in addition, we can expect more applications to be actively developed.
KSP Keywords
Ad hoc, Apache Hadoop, Big-data, Distributed File system, Functional Modules, Functional aspects, System utilization, database tables, design strategy, detailed design, experimental evaluation