Examining the impact of stemming on clustering Turkish texts

Küçük Resim Yok

Tarih

2012

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Preprocessing is an important step in information retrieval and text mining. In this study, we examined the impact of stemming on clustering Turkish texts. We used two datasets compiled from web sites of Turkish news agencies, and performed extensive experiments. We empirically show that there is no significant evidence that stemming always improves the quality of clustering for texts in Turkish. However, when stemming is used, dimensionality of the document-term matrix dramatically decreases without inversely affecting the clustering performance. As a result, it is highly recommended to apply stemming for clustering Turkish texts. © 2012 IEEE.

Açıklama

International Symposium on INnovations in Intelligent SysTems and Applications, INISTA 2012 -- 2 July 2012 through 4 July 2012 -- Trabzon -- 92831

Anahtar Kelimeler

data mining, document clustering, preprocessing, stemming, text mining

Kaynak

INISTA 2012 - International Symposium on INnovations in Intelligent SysTems and Applications

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye