ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Journal Article Detecting Partial and Near Duplication in the Blogosphere
Cited 0 time in scopus Share share facebook twitter linkedin kakaostory
Authors
Yeo-Chan YOON, Myung-Gil JANG, Hyun-Ki KIM, So-Young PARK
Issue Date
2012-02
Citation
IEICE Transactions on Information and Systems, v.E95.D, no.2, pp.681-685
ISSN
0916-8532
Publisher
일본, 전자정보통신학회 (IEICE)
Language
English
Type
Journal Article
DOI
https://dx.doi.org/10.1587/transinf.E95.D.681
Abstract
In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence. Copyright © 2012 The Institute of Electronics, Information and Communication Engineers.
KSP Keywords
Detection model, Information and communication, Proposed model, document detection, near-duplicates, word sequence