PRETO: A high-performance text mining tool for preprocessing Turkish texts

Küçük Resim Yok

Tarih

2012

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Araştırma projeleri

Organizasyon Birimleri

Dergi sayısı

Özet

Text documents are usually unstructured and written in natural language. To apply conventional data mining techniques on text documents, a preprocessing operation is indispensable. In this paper, we introduce PRETO, a cross-platform, powerful and scalable preprocessing tool developed specifically for preprocessing Turkish texts, with a wide range of preprocessing options like stemming, stopword filtering, statistical term filtering, and n-gram generation. We demonstrate the performance and scalability of PRETO with some experiments on large document collections. Copyright ©2012 ACM.

Açıklama

13th International Conference on Computer Systems and Technologies, CompSysTech 2012 -- 22 June 2012 through 23 June 2012 -- Ruse -- 93756

Anahtar Kelimeler

Data Mining, Natural Language Processing, Text Mining, Text Preprocessing

Kaynak

ACM International Conference Proceeding Series

WoS Q Değeri

Scopus Q Değeri

N/A

Cilt

Sayı

Künye