Data clustering on the parallel hadoop mapreduce model

Βερράρος, Δημήτριος


Institution and School/Department of submitter: ΤΕΙ Θεσσαλονίκης
Issue Date: 3-Nov-2015
Abstract: Machine Learning is one of the best ways to process and analyse data. The industry created the tools to utilize the benefits of machine learning algorithms by executing them in a parallel way, on huge computer clusters where the information is stored. One of the most popular is Apache Hadoop, which provides the abstractions needed to perform those operations in a way that more businesses, organizations and individuals can use it, in order to achieve their goals. In this thesis, we examine the most popular machine learning algorithm, the K-means, and implement it on the MapReduce framework. We then execute it on a Hadoop cluster, to measure the performance gains offered by parallelizing the algorithm that analyzes data distributed on multiple machines. Alternative solutions and evolutions in the direction of parallel data processing are presented to conclude an overview of the possible directions that the Big Data term moves towards.
Description: Πτυχιακή εργασία--ΣΤΕΦ--Τμήμα Πληροφορικής, 2014
URI: http://195.251.240.227/jspui/handle/123456789/10972
Appears in Collections:Πτυχιακές Εργασίες

Files in This Item:
File Description SizeFormat 
Verraros_Dimitrios_ppt.pdf995.51 kBAdobe PDFView/Open
Verraros_Dimitrios.pdf1.63 MBAdobe PDFView/Open



 Please use this identifier to cite or link to this item:
http://195.251.240.227/jspui/handle/123456789/10972
  This item is a favorite for 0 people.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.