ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Performance Study of Spark on YARN Cluster using HiBench
Cited 3 time in scopus Download 25 time Share share facebook twitter linkedin kakaostory
Authors
HooYoung Ahn, Hyunjae Kim, Woongshik You
Issue Date
2018-06
Citation
International Conference on Consumer Electronics (ICCE) 2018 : Asia, pp.206-212
Publisher
IEEE
Language
English
Type
Conference Paper
DOI
https://dx.doi.org/10.1109/ICCE-ASIA.2018.8552137
Abstract
Recently, various kinds of Internet-of-Things (IoT) solutions and services are provided such as smart industry, smart city, smart factory, smart agriculture and etc. Those solutions and services generate large amount of data from various devices which are connected through networks while they communicate with each other. However, it is a difficult problem to process the fast and massively produced data efficiently. To solve the problems in the framework level, there are many open-source big data processing and analysis frameworks. To process large-scale data in a fast manner, those frameworks use a cluster consisting of multiple computing machines. However, to set the framework running on large-scale cluster properly is not simple and it is difficult to verify its performance in the distributed environment. In this paper, we evaluate the performance of Apache Spark which is one of the most popular big data processing and analysis frameworks. Especially, we conduct experiments by using a representative benchmark tool, called HiBench, and large-scale data in the cluster environment. From the experimental results, we can conclude that Spark is highly scalable for distributed machine learning as well as big data processing.
KSP Keywords
Apache spark, Benchmark tool, Big Data Processing, Distributed Environment, Distributed machine learning, Internet of thing(IoT), Large-scale data, Open source, Smart Factory, Smart city, Smart industry