Data clustering on the parallel hadoop mapreduce model

Institution and School/Department of submitter:	ΤΕΙ Θεσσαλονίκης
Issue Date:	3-Nov-2015
Abstract:	Machine Learning is one of the best ways to process and analyse data. The industry created the tools to utilize the benefits of machine learning algorithms by executing them in a parallel way, on huge computer clusters where the information is stored. One of the most popular is Apache Hadoop, which provides the abstractions needed to perform those operations in a way that more businesses, organizations and individuals can use it, in order to achieve their goals. In this thesis, we examine the most popular machine learning algorithm, the K-means, and implement it on the MapReduce framework. We then execute it on a Hadoop cluster, to measure the performance gains offered by parallelizing the algorithm that analyzes data distributed on multiple machines. Alternative solutions and evolutions in the direction of parallel data processing are presented to conclude an overview of the possible directions that the Big Data term moves towards.
Description:	Πτυχιακή εργασία--ΣΤΕΦ--Τμήμα Πληροφορικής, 2014
URI:	http://195.251.240.227/jspui/handle/123456789/10972
Appears in Collections:	Πτυχιακές Εργασίες

Files in This Item:

File	Description	Size	Format
Verraros_Dimitrios_ppt.pdf		995.51 kB	Adobe PDF	View/Open
Verraros_Dimitrios.pdf		1.63 MB	Adobe PDF	View/Open

Show full item record

Please use this identifier to cite or link to this item:

http://195.251.240.227/jspui/handle/123456789/10972

This item is a favorite for 0 people.

Add to favorites

Eureka! Institutional Repository