Canopy with k-means clustering algorithm for big data analytics

dc.contributor.authorSagheer, Noor S.
dc.contributor.authorYousif, Suhad A.
dc.date.accessioned2024-07-12T20:47:08Z
dc.date.available2024-07-12T20:47:08Z
dc.date.issued2021en_US
dc.departmentFakülteler, İnsan ve Toplum Bilimleri Fakültesi, Matematik Bölümüen_US
dc.description.abstract. Recently, Big Data is gathered from various sources in different types, and it is not easy to analyze them by traditional methods. Apache Hadoop is a robust solution to the problems of saving and processing large datasets by providing HDFS (Hadoop Distributed File System) and MapReduce for storing and processing data. One of the essential methods for analyzing big data to discover new patterns is the clustering algorithms. In this paper, we have used the canopy clustering algorithm provided by Distributed Machine Learning with Apache Mahout as preprocessing step for the k-means clustering algorithm. The results showed that using Canopy as a preprocessing step has sped up the time of managing the massive scale of the healthcare insurance dataset, and it also reduces the execution time of the k-means by providing initial centroids for the given dataset.en_US
dc.identifier.citationSagheer, N.S. ve Yousif, S.A. (2021). Canopy with k-means clustering algorithm for big data analytics. Fourth International Conference of Mathematical Sciences, Maltepe Üniversitesi. s. 1-4.en_US
dc.identifier.endpage4en_US
dc.identifier.isbn978-0-7354-4078-4
dc.identifier.startpage1en_US
dc.identifier.urihttps://aip.scitation.org/doi/10.1063/5.0042398
dc.identifier.urihttps://hdl.handle.net/20.500.12415/1941
dc.language.isoenen_US
dc.publisherMaltepe Üniversitesien_US
dc.relation.ispartofFourth International Conference of Mathematical Sciencesen_US
dc.relation.isversionof10.1063/5.0042398en_US
dc.relation.publicationcategoryUluslararası Konferans Öğesi - Başka Kurum Yazarıen_US
dc.rightsCC0 1.0 Universal*
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/*
dc.snmzKY07361
dc.subjectBig Dataen_US
dc.subjectk-meansen_US
dc.subjectcanopyen_US
dc.subjectMahouten_US
dc.subjectHealth Careen_US
dc.subjectconfusion matrixen_US
dc.subjectHDFSen_US
dc.titleCanopy with k-means clustering algorithm for big data analyticsen_US
dc.typeConference Object
dspace.entity.typePublication

Dosyalar