ETRI-Knowledge Sharing Plaform



논문 검색
구분 SCI
연도 ~ 키워드


학술대회 Performance Study of Spark on YARN Cluster using HiBench
Cited 3 time in scopus Download 23 time Share share facebook twitter linkedin kakaostory
안후영, 김현재, 유웅식
International Conference on Consumer Electronics (ICCE) 2018 : Asia, pp.206-212
Recently, various kinds of Internet-of-Things (IoT) solutions and services are provided such as smart industry, smart city, smart factory, smart agriculture and etc. Those solutions and services generate large amount of data from various devices which are connected through networks while they communicate with each other. However, it is a difficult problem to process the fast and massively produced data efficiently. To solve the problems in the framework level, there are many open-source big data processing and analysis frameworks. To process large-scale data in a fast manner, those frameworks use a cluster consisting of multiple computing machines. However, to set the framework running on large-scale cluster properly is not simple and it is difficult to verify its performance in the distributed environment. In this paper, we evaluate the performance of Apache Spark which is one of the most popular big data processing and analysis frameworks. Especially, we conduct experiments by using a representative benchmark tool, called HiBench, and large-scale data in the cluster environment. From the experimental results, we can conclude that Spark is highly scalable for distributed machine learning as well as big data processing.
KSP 제안 키워드
Apache spark, Benchmark tool, Big Data Processing, Distributed Environment, Distributed machine learning, Internet of thing(IoT), Large-scale data, Open source, Smart Factory, Smart city, Smart industry