ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Other Text Extraction
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Sung Hee Park, Venkat Srinivasan, Pranav Angara
Issue Date
2014-03
Citation
Digital Library Technologies: Complex Objects, Annotation, Ontologies, Classification, Extraction, and Security [Book], pp.101-125
Language
English
Type
Other
Abstract
To support many digital library activities, it is useful to extract data, information, and knowledge from text. Text processing (including tokenization), natural language processing, and machine learning are key technologies involved. When one begins with large and/or composite documents, document segmentation (e.g., identifying sections, or finding a figure and separating its label and illustration) also is a necessary precursor, and can directly address needs for extracting images and captions. In this chapter, we cover formal and practical aspects related to the implementation using machine learning of text extraction services in digital libraries. A case study on reference string parsing illustrates the promise and complexity of a text extraction service. This requires feature extraction, training, and classification of extracted entities.
KSP Keywords
Case studies, Composite documents, Digital Library, Document segmentation, Feature extractioN, Key technology, Natural Language Processing(NLP), Text processing, machine Learning, practical aspects, text extraction