Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013). 2013, 73-88
Despite the existence of effective methods that solve named entity recognition tasks for such widely used languages as English, there is no clear answer which methods are the most suitable for languages that are substantially different. In this paper we attempt to solve a named entity recognition task for Lithuanian, using a supervised machine learning approach and exploring different sets of features in terms of orthographic and grammatical information, different windows, etc. Although the performance is significantly higher when language dependent features based on gazetteer lookup and automatic grammatical tools (part-of-speech tagger, lemmatizer or stemmer) are taken into account; we demonstrate that the performance does not degrade when features based on grammatical tools are replaced with affix information only. The best results (micro-averaged F-score=0.895) were obtained using all available features, but the results decreased by only 0.002 when features based on grammatical tools were omitted.
Jurgita Kapociute-Dzikiene, Anders Nøklestad, Janne Bondi Johannessen, Algis Krupavicius (2013). Exploring Features for Named Entity Recognition in Lithuanian Text Corpus, Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22–24; 2013; Oslo University; Norway. NEALT Proceedings Series 16 http://www.ep.liu.se/ecp_article/index.en.aspx?issue=085;article=011