We will be doing some analytics on real data from a Portuguese banking
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

We will be doing some analytics on real data from a Portuguese banking

Task 1: Analysing Bank Data

We will be doing some analytics on real data from a Portuguese banking institution. The data is stored in a semicolon (“;”) delimited format.

The data is supplied with the assignment at the following locations:

Small version                                            Full version

Task_1/Data/bank-small.csv                Task_1/Data/bank.csv

The data has the following attributes


Here is a small example of the bank data that we will use to illustrate the subtasks below (we only list a subset of the attributes in this example, see the above table for the description of the attributes):


Please note we specify whether you should use [Hive] or [Spark RDD] for each subtask at the beginning of each subtask.

a) [Hive] Report the number of clients of each job category. Write the results to “Task_1a-out”. For the above small example data set you would report the following (output order is not important for this question):

"blue-collar" 1

"entrepreneur"     1

"management"     2

"services" 1

"technician" 3

b) [Hive] Report the average yearly balance for all people in each education category. Write the results to “Task_1b-out”. For the small example data set you would report the following (output order is not important for this question):

"primary" 10.0

"secondary" 286.6666666666667

"tertiary" 1031.3333333333333

"unknown" 1506.0

c) [Spark RDD] Group balance into the following three categories:

a. Low: -infinity to 500

b. Medium: 501 to 1500 =>

c. High: 1501 to +infinity

Report the number of people in each of the above categories. Write the results to “Task_1c-out” in text file format. For the small example data set you should get the following results (output order is not important in this question):

(High,2)

(Medium,2)

(Low,4)

d) [Spark RDD] Sort all people in ascending order of education. For people with the same education, sort them in descending order by balance. This means that all people with the same education should appear grouped together in the output. For each person report the following attribute values: education, balance, job, marital, loan. Write the results to “Task_1d-out” in text file format (multiple parts are allowed). For the small example data set you would report the following:

("primary",10,"technician","married","no")

("secondary",829,"services","divorced","yes")

("secondary",29,"technician","divorced","yes")

("secondary",2,"entrepreneur","single","no")

("tertiary",2143,"management","married","yes")

("tertiary",929,"technician","married","yes")

("tertiary",22,"management","divorced","no")

("unknown",1506,"blue-collar","married","no")

Hint
Computer A real data  type is an information type utilized in a PC program to address a guess of a genuine number. The genuine numbers are not countable, PCs can't address them precisely utilizing a limited measure of data. Most frequently, a PC will utilize an objective estimate of a genuine number....

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.