ETRI Knowledge Sharing Platform : Domain Issues in Statistical Named Entity Recognition for Biomedical Literature

Titles

논문 검색
Type		SCI
Year	~	Keyword

List

Conference Paper Domain Issues in Statistical Named Entity Recognition for Biomedical Literature

Cited - time in scopus

Citation: International Conference Korean Soceity for Bioinformatics (KSBI) 2006, pp.1-6

Abstract: As huge amounts of biomedical publications appear, there exist increasing needs to get structured information from the literatures. Finding gene or protein names in biomedical literature is the very first step for further knowledge acquisition works in text mining. Nowadays, machine learning approaches are widely used for named entity recognition (NER) task. But, previous works were focused on the features and performances of NER methods. In this paper, we will show the effects of in-domain part-of-speech (POS) tagger and base phrase chunker, in the point of biomedical NER. We will also examine the results of inter-domain NER within the same biomedical domain.The preprocessors, which were trained on general news domain corpus, did not perform well on biomedical texts. So, we have boosted their accuracies by training with biomedical corpora. Our final in-domain POS tagger and base phrase chunker performed well on biomedical texts. The features generated from in-domain preprocessors were also effective for our statistical NER method.Lexical features acquired from one domain were very different from those of the other domain, even though both training corpora had been collected from the same biomedical database. It resulted significant decrease of NER performance. General knowledge sources outside the training set, such as terminologies databases and ontologies, are required to cope with performance decrease.

KSP Keywords: Biomedical domain, Biomedical literature, Inter-Domain, Lexical features, Machine Learning Approach, Named entity Recognition, POS tagger, Part of Speech(POS), biomedical publications, knowledge acquisition, knowledge sources

218 Gajeong-ro, Yuseong-gu, Daejeon, 34129, KOREA, Contact: sh.kim@etri.re.kr

Please refrain from automatic collection of e-mail addresses posted on this homepage.