ETRI-Knowledge Sharing Plaform

ENGLISH

성과물

논문 검색
구분 SCI
연도 ~ 키워드

상세정보

학술지 Detecting Partial and Near Duplication in the Blogosphere
Cited 0 time in scopus Download 1 time Share share facebook twitter linkedin kakaostory
저자
윤여찬, 장명길, 김현기, 박소영
발행일
201202
출처
IEICE Transactions on Information and Systems, v.E95.D no.2, pp.681-685
ISSN
0916-8532
출판사
일본, 전자정보통신학회 (IEICE)
DOI
https://dx.doi.org/10.1587/transinf.E95.D.681
협약과제
11VS1200, 웹 인텔리전스를 위한 웹 폭증 데이터 분석형 리스닝 플랫폼용 소셜웹 이슈탐지-모니터링 및 예측원천 기술, 김현기
초록
In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence. Copyright © 2012 The Institute of Electronics, Information and Communication Engineers.
KSP 제안 키워드
Detection model, Information and communication, Proposed model, document detection, near-duplicates, word sequence