ETRI Knowledge Sharing Platform : Join Processing with Threshold-Based Filtering in MapReduce

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Journal Article Join Processing with Threshold-Based Filtering in MapReduce

Cited 5 time in scopus

Abstract: Data analytics, in particular those involving heterogeneous data, often require join operations on datasets collected from different sources. MapReduce, one of the most popular frameworks for large-scale data processing, is not suited for joining multiple datasets. This is because MapReduce often produces a large number of redundant intermediate results, irrespective of the size of the joined records. Although several existing approaches attempt to reduce the number of such redundant results using Bloom filters, they may be inefficient if large portions of records are joined or the number of distinct keys is large. To alleviate this problem, we propose a join processing method with threshold-based filtering in MapReduce, called TMFR-Join, which is an abbreviation for ?쏷hreshold-based Map-Filter-Reduce Join??. TMFR-Join applies filters according to their performance, which is estimated in terms of false-positive rates. It also provides a general framework for exploiting various filtering techniques that support certain desired operations. The experimental results indicate that the performance of TMFR-Join is close to that of the better of existing join processing techniques, both with and without filters.

KSP Keywords: Data Analytics, Existing Approaches, Filtering technique, General Framework, Heterogeneous Data, Join processing, Multiple datasets, Processing Method, based filtering, bloom filter, large-scale data processing

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.