Data clustering on the parallel hadoop mapreduce model
Βερράρος, Δημήτριος
Institution and School/Department of submitter: | ΤΕΙ Θεσσαλονίκης |
Issue Date: | 3-Nov-2015 |
Abstract: | Machine Learning is one of the best ways to process and analyse data. The industry created the tools to utilize the benefits of machine learning algorithms by executing them in a parallel way, on huge computer clusters where the information is stored. One of the most popular is Apache Hadoop, which provides the abstractions needed to perform those operations in a way that more businesses, organizations and individuals can use it, in order to achieve their goals. In this thesis, we examine the most popular machine learning algorithm, the K-means, and implement it on the MapReduce framework. We then execute it on a Hadoop cluster, to measure the performance gains offered by parallelizing the algorithm that analyzes data distributed on multiple machines. Alternative solutions and evolutions in the direction of parallel data processing are presented to conclude an overview of the possible directions that the Big Data term moves towards. |
Description: | Πτυχιακή εργασία--ΣΤΕΦ--Τμήμα Πληροφορικής, 2014 |
URI: | http://195.251.240.227/jspui/handle/123456789/10972 |
Appears in Collections: | Πτυχιακές Εργασίες |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Verraros_Dimitrios_ppt.pdf | 995.51 kB | Adobe PDF | View/Open | |
Verraros_Dimitrios.pdf | 1.63 MB | Adobe PDF | View/Open |
Please use this identifier to cite or link to this item:
This item is a favorite for 0 people.
http://195.251.240.227/jspui/handle/123456789/10972
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.