Examining the impact of stemming on clustering Turkish texts
Küçük Resim Yok
Tarih
2012
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
Preprocessing is an important step in information retrieval and text mining. In this study, we examined the impact of stemming on clustering Turkish texts. We used two datasets compiled from web sites of Turkish news agencies, and performed extensive experiments. We empirically show that there is no significant evidence that stemming always improves the quality of clustering for texts in Turkish. However, when stemming is used, dimensionality of the document-term matrix dramatically decreases without inversely affecting the clustering performance. As a result, it is highly recommended to apply stemming for clustering Turkish texts. © 2012 IEEE.
Açıklama
International Symposium on INnovations in Intelligent SysTems and Applications, INISTA 2012 -- 2 July 2012 through 4 July 2012 -- Trabzon -- 92831
Anahtar Kelimeler
data mining, document clustering, preprocessing, stemming, text mining
Kaynak
INISTA 2012 - International Symposium on INnovations in Intelligent SysTems and Applications