ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Join Processing with Threshold-Based Filtering in MapReduce
Cited 5 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Taewhi Lee, Hye-Chan Bae, Hyoung-Joo Kim
Issue Date
2014-08
Citation
Journal of Supercomputing, v.69, no.2, pp.793-813
ISSN
0920-8542
Publisher
Springer
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1007/s11227-014-1179-9
Abstract
Data analytics, in particular those involving heterogeneous data, often require join operations on datasets collected from different sources. MapReduce, one of the most popular frameworks for large-scale data processing, is not suited for joining multiple datasets. This is because MapReduce often produces a large number of redundant intermediate results, irrespective of the size of the joined records. Although several existing approaches attempt to reduce the number of such redundant results using Bloom filters, they may be inefficient if large portions of records are joined or the number of distinct keys is large. To alleviate this problem, we propose a join processing method with threshold-based filtering in MapReduce, called TMFR-Join, which is an abbreviation for ?쏷hreshold-based Map-Filter-Reduce Join??. TMFR-Join applies filters according to their performance, which is estimated in terms of false-positive rates. It also provides a general framework for exploiting various filtering techniques that support certain desired operations. The experimental results indicate that the performance of TMFR-Join is close to that of the better of existing join processing techniques, both with and without filters.
KSP Keywords
Bloom Filter, Data Analytics, Existing Approaches, Filtering technique, General Framework, Heterogeneous data, Join processing, Large-scale data processing, Multiple datasets, Processing Method, based filtering