ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Join Processing with Threshold-Based Filtering in MapReduce
Cited 5 time in scopus Download 4 time Share share facebook twitter linkedin kakaostory
저자
이태휘, 배혜찬, 김형주
발행일
201408
출처
Journal of Supercomputing, v.69 no.2, pp.793-813
ISSN
0920-8542
출판사
Springer
DOI
https://dx.doi.org/10.1007/s11227-014-1179-9
협약과제
14ZS1400, 듀얼모드 배치.쿼리 분석을 제공하는 빅데이터 플랫폼 핵심기술 개발, 박경
초록
Data analytics, in particular those involving heterogeneous data, often require join operations on datasets collected from different sources. MapReduce, one of the most popular frameworks for large-scale data processing, is not suited for joining multiple datasets. This is because MapReduce often produces a large number of redundant intermediate results, irrespective of the size of the joined records. Although several existing approaches attempt to reduce the number of such redundant results using Bloom filters, they may be inefficient if large portions of records are joined or the number of distinct keys is large. To alleviate this problem, we propose a join processing method with threshold-based filtering in MapReduce, called TMFR-Join, which is an abbreviation for ?쏷hreshold-based Map-Filter-Reduce Join??. TMFR-Join applies filters according to their performance, which is estimated in terms of false-positive rates. It also provides a general framework for exploiting various filtering techniques that support certain desired operations. The experimental results indicate that the performance of TMFR-Join is close to that of the better of existing join processing techniques, both with and without filters.
KSP 제안 키워드
Bloom Filter, Data Analytics, Existing Approaches, Filtering technique, General Framework, Heterogeneous data, Join processing, Large-scale data processing, Multiple datasets, Processing Method, based filtering