Instructions
The following file is from Movielens dataset which shows user ratings for movies:
http://files.grouplens.org/datasets/movielens/ml-100k/u.data
You can find more about this dataset here:
https://files.grouplens.org/datasets/movielens/ml-100k-README.txt
u.data is the full u data set with 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp. The time stamps are unix seconds since 1/1/1970 UTC. For example, the following line of the file
95 546 2 879196566
Is interpreted as follows: User 95 has rated movie 546, 2/5 (rates are in the range 1-5) at time 879196566 (Monday, November 10, 1997 9:16:06 PM, GMT).
Your task is to use MapReduce programming and find the following information for each user: the average rating and the number of movies rated by this user. Here is an example of the output:
You can choose the output format. However, the required information must be included in the output. You need include the output file in your submission.
Hint: You can change the WordCount program such that it ignores all tokens in a line except the third one (rating value in the file exists in the third column).
The program must also print the name of group members on the screen.
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.