ETRI Knowledge Sharing Platform : Text Extraction

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Other Text Extraction

Cited - time in scopus

Citation: Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security [Book], pp.101-125

Abstract: To support many digital library activities, it is useful to extract data, information, and knowledge from text. Text processing (including tokenization), natural language processing, and machine learning are key technologies involved. When one begins with large and/or composite documents, document segmentation (e.g., identifying sections, or ﬁnding a ﬁgure and separating its label and illustration) also is a necessary precursor, and can directly address needs for extracting images and captions. In this chapter, we cover formal and practical aspects related to the implementation using machine learning of text extraction services in digital libraries. A case study on reference string parsing illustrates the promise and complexity of a text extraction service. This requires feature extraction, training, and classiﬁcation of extracted entities.

KSP Keywords: Case studies, Composite documents, Digital Library, Document segmentation, Key technology, Natural Language processing, Text processing, feature extraction, machine Learning, practical aspects, text extraction

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.