ETRI Knowledge Sharing Platform : Moving Metadata from Ad Hoc Files to Database Tables for Robust, Highly Available, and Scalable HDFS

BROWSE

Titles

논문 검색
Type		SCI
Year	~	Keyword

Detail

List

Journal Article Moving Metadata from Ad Hoc Files to Database Tables for Robust, Highly Available, and Scalable HDFS

Cited 21 time in scopus

Authors: Heesun Won, Minh Chau Nguyen, Myeong-Seon Gil, Yang-Sae Moon, Kyu-Young Whang

Issue Date: 2017-06

Citation: Journal of Supercomputing, v.73, no.6, pp.2657-2681

ISSN: 0920-8542

Publisher: Springer

Language: English

Type: Journal Article

DOI: https://dx.doi.org/10.1007/s11227-016-1949-7

Abstract: As a representative large-scale data management technology, Apache Hadoop is an open-source framework for processing a variety of data such as SNS, medical, weather, and IoT data. Hadoop largely consists of HDFS, MapReduce, and YARN. Among them, we focus on improving the HDFS metadata management scheme responsible for storing and managing big data. We note that the current HDFS incurs many problems in system utilization due to its file-based metadata management. To solve these problems, we propose a novel metadata management scheme based on RDBMS for improving the functional aspects of HDFS. Through analysis of the latest HDFS, we first present five problems caused by its metadata management and derive three requirements of robustness, availability, and scalability for resolving these problems. We then design an overall architecture of the advanced HDFS, A-HDFS, which satisfies these requirements. In particular, we define functional modules according to HDFS operations and also present the detailed design strategy for adding or modifying the individual components in the corresponding modules. Finally, through implementation of the proposed A-HDFS, we validate its correctness by experimental evaluation and also show that A-HDFS satisfies all the requirements. The proposed A-HDFS significantly enhances the HDFS metadata management scheme and, as a result, ensures that the entire system improves its stability, availability, and scalability. Thus, we can exploit the improved distributed file system based on A-HDFS for various fields and, in addition, we can expect more applications to be actively developed.

KSP Keywords: Ad hoc, Apache Hadoop, Big-data, Distributed File system, Functional Modules, Functional aspects, System utilization, database tables, design strategy, detailed design, experimental evaluation

ETRI-Knowledge Sharing Plaform

BROWSE

Titles

Detail

ETRI