ETRI-Knowledge Sharing Plaform

KOREAN
논문 검색
Type SCI
Year ~ Keyword

Detail

Conference Paper Domain Issues in Statistical Named Entity Recognition for Biomedical Literature
Cited - time in scopus Share share facebook twitter linkedin kakaostory
Authors
Jae Soo Lim, Hyun Chul Jang, Joon Ho Lim, Soo Jun Park
Issue Date
2006-09
Citation
International Conference Korean Soceity for Bioinformatics (KSBI) 2006, pp.1-6
Language
English
Type
Conference Paper
Abstract
As huge amounts of biomedical publications appear, there exist increasing needs to get structured information from the literatures. Finding gene or protein names in biomedical literature is the very first step for further knowledge acquisition works in text mining. Nowadays, machine learning approaches are widely used for named entity recognition (NER) task. But, previous works were focused on the features and performances of NER methods. In this paper, we will show the effects of in-domain part-of-speech (POS) tagger and base phrase chunker, in the point of biomedical NER. We will also examine the results of inter-domain NER within the same biomedical domain.The preprocessors, which were trained on general news domain corpus, did not perform well on biomedical texts. So, we have boosted their accuracies by training with biomedical corpora. Our final in-domain POS tagger and base phrase chunker performed well on biomedical texts. The features generated from in-domain preprocessors were also effective for our statistical NER method.Lexical features acquired from one domain were very different from those of the other domain, even though both training corpora had been collected from the same biomedical database. It resulted significant decrease of NER performance. General knowledge sources outside the training set, such as terminologies databases and ontologies, are required to cope with performance decrease.