ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Parallel Labeling of Massive XML Data with MapReduce
Cited 15 time in scopus Download 0 time Share share facebook twitter linkedin kakaostory
저자
최혜봉, 이경하, 이윤준
발행일
201402
출처
Journal of Supercomputing, v.67 no.2, pp.408-437
ISSN
0920-8542
출판사
Springer
DOI
https://dx.doi.org/10.1007/s11227-013-1008-6
협약과제
13PR1800, 빅데이터 활용을 위한 지식 자산(Knowledge Base) 구축 및 실시간 Linked Data 응용기술 개발, 조기성
초록
The volume of XML data has become enormous and still grows very quickly as many data have been typed in XML by virtue of its simplicity and extensibility. While a tree labeling algorithm has a crucial role in XML query processing, conventional algorithms are all sequential so that they fail to label a large volume of XML data in a timely manner. To address this issue, we devise parallel tree labeling algorithms for massive XML data. Specifically, we focus on how to efficiently label a single large XML file in parallel. We first propose parallel versions of two prominent tree labeling schemes based on the MapReduce framework. We then present techniques for runtime workload balancing and data repartition to solve performance issues caused by data skewness and MapReduce's inherited limitation. Through extensive experiments with synthetic and real-world datasets on 15 nodes, we show that our parallel labeling algorithms are up to 17 times faster than conventional algorithms, providing strong durability against data skewness. © Springer Science+Business Media New York 2013.