The server is under maintenance between 08:00 to 12:00 (GMT+08:00), and please visit
later.
We apologize for any inconvenience caused
Research on Handling Data Skew in MapReduce
Author(s): WANG Gang, LI Sheng-en
Pages: 201-
204
Year: 2016
Issue:
9
Journal: Computer Technology and Development
Keyword: big data; load balancing; sampling;
Abstract: With the rapid development of mobile Internet and the Internet of Things,the data size explosively grows,and people have been in the era of big data. As a distributed computing framework,MapReduce has the ability of processing massive data and becomes a focus in big data. But the performance of MapReduce depends on the distribution of data. The Hash partition function defaulted by MapReduce can’ t guarantee load balancing when data is skewed. The time of job is affected by the node which has more data to process. In order to solve the problem,sampling is used. It does a MapReduce job to sample before dealing with user’ s job in this paper. After learning the distribution of key,load balance of data partition is achieved using data locality. The example of WordCount is tested in experimental plat-form. Results show that data partition using sample is better than Hash partition,and taking data locality is much better than that using sample but no data locality.
Citations
No citation found