You serve as a data analyst for IntegrateCo, a company that installs and services
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

You serve as a data analyst for IntegrateCo, a company that installs and services

IntegrateCo

Access version

Background 

You serve as a data analyst for IntegrateCo, a company that installs and services integrated building management systems in the Intermountain West of the United States of America. Building management systems are computer-based systems installed in buildings to control all mechanical and electric equipment in the building. Examples of mechanical and electric equipment controlled by the building management system include lightning, air conditioning and heating, security systems, power systems, sound systems, video cameras, door-locking systems, etc. IntegrateCo serves both large and small customers. It is privately held and expects to have $15 million in revenue in 2016.  

The CFO, Noah Summers, noticed that payroll for the first six months of 2016 was approximately $266,000 more than in the first sixth months of 2015. Noah wants to know more why there was a difference and asked you to compare payroll for the first sixth months of 2016 with the first sixth months of 2015 and report to him what you find.  

One complication in performing this analysis is that IntegrateCo implemented a new accounting system at the start of 2016. Thus, the payroll data in 2015 and 2016 came from different systems. As a result, you will have to take great care to understand the data output from each system and appropriately combine the data for your analysis. 

Upon receiving this assignment from Noah, you recognize that you will need to exercise an analytics mindset to respond to his request. As a reminder, an analytics mindset is the ability to:

Ask the right questions

Extract, transform and load relevant data

Apply appropriate data analytics techniques

Interpret and share the results with stakeholders

This case is broken into four parts to help you use an analytics mindset and provide the information Noah wants.  

In Part I, you will develop questions to ask Noah before you begin your analysis. You will also evaluate the data and gain an understanding of the two different data files.  

In Part II, you will transform and load the data into Access, which you will use to analyze the data.  

In Part III, you will analyze and interpret the data and then prepare a report for Noah.  

In Part IV, you will respond to follow-up questions asked by Noah after his initial review of your report. 

Part I: 

Required

Noah has noted that the payroll expense in 2016 was more than in 2015 by approximately $266,000 and he has asked you to explain why. Prepare a list of at least five questions you would like answered to address Noah’s request. That is, you should think about possible reasons why payroll was higher in 2016 than in 2015 and the questions Noah is likely to want answered about the differences in payroll between the years.  

The client provided you with three pipe-delimited text files. It is important to understand the characteristics of the data in each of these files. Read the description of the data in each file in the appendix on the following page. Use the framework of the four “V’s” of data (variety, velocity, veracity and volume) and prepare responses to the questions below. Note that this list is not exhaustive, and you may have other questions about your data to fully understand it before you begin your analysis.

Variety – different forms and formats of the data  

Are all of the data set formats the same? Do they need to be the same for your analysis?

Do all fields contain the same labels? Does the data with similarly titled labels contain the same type of data?  

How are the files delimited? Are there any extra delimiters that may cause problems when importing? What strategies can you use to deal with any of these challenges?  

Is the data structured or unstructured? What transformation would be needed to any unstructured data to make it possible to analyze it?  

Is the data aggregated at the same level?

Velocity – frequency of incoming data that needs processing

Is your analysis performed on “live” data or only on historical data? 

How often will you be updating this analysis? How automated should the analysis be? What tool might make sense to use in automating this process? 

Veracity – trustworthiness of the data

Is the data you have complete? Do the data files you received contain all transactions?  Are all of the data fields complete for each year and do the files contain all of the same data for each year?  

Does the data contained in the data files accurately represent the economic transactions?  

What human judgment went into establishing the data?  

Volume – the amount or scale of data

Should you include data for all years? 

Should you include data from all entities?  

Are all fields relevant to your analysis?  

How many rows will you need to import? What tools can handle this quantity of data?   

Appendix

Data file descriptions

File: Analytics_mindset_case_studies_jobcodes.csv

This file lists different job codes within IntegrateCo. There are 12 different job codes, numbered 1 to 12, that are used to categorize employees’ pay. Any job codes that are outside of this range are errors in the data. For managerial decision-making, the company assigns employees’ pay to different job codes. That is, if Sally earned $100 working for five hours and she spent three hours on job code 1 (office), one-and-a-half hours on job code 2 (sales) and a half hour on job code 8 (employee training), the company assigns $60 to job code 1, $30 to job code 2 and $10 to job code 8. The company lists job codes using two digits (01, instead of 1) to avoid any confusion and problems. As such, the data is, and should be, stored as text and the two-digit job code always should be used.

File: Analytics_mindset_case_studies_2015payroll.csv

This file contains the output of data from the first sixth months of 2015. Each row represents a unique employee and job code combination. From the example above about Sally, there would be three lines, one for each job code in the data. If Sally did the exact same amount and type of work in the next month, three more rows would be added. To figure out an employee’s paycheck, you would sum all of the payments to an employee on a specific date. The rest of the data is defined as follows:

Pay_Date: This is the date that the employee was paid.

Payroll_ID: This is a unique identifier for the date that the payroll was run for all employees. Each unique date should have a unique Payroll_ID number.

Emp_No: This is a unique identifier for each employee. These numbers are not reused so all employees have their own number. 

Home_Div: This is the division number to which each employee belongs.  

Work_Div: This is the work division number to which each employee belongs.

First_Name: This is the first name of the employee being paid.

Last_Name: This is the last name of the employee being paid.

Job: This is a unique code that combines several pieces of information into one field. The first two numbers represent the year that the project began. So 14 means the project started in 2014. Then, sometimes, there is a department identifier (H or S) that is included. The next three digits represent the job code (e.g., 001, 010, 012). After the job code, sometimes there is a location code (B or SG). Here are several examples:

14005 – The 14 represents the year the project began and 005 represents the job code 005 (Installation). There is no department identifier or location code included.

15S002 – The 15 represents the year the project began, S represents the department and 002 represents the job code 002 (Sales). There is no location code included.

14S001SG – The 14 represents the year the project began, S represents the department, 001 represents the job code 001 (Office) and SG represents the location code.

15007B – The 15 represents the year the project began, 007 represents the job code 007 (Customer Train) and B represents the location code. There is no department identifier included.

Reg_Hrs: This is the number of regular hours reported by the employee that apply to the job code during the time period. 

OT_Hrs: This is the number of overtime hours reported by the employee that apply to the job code during the time period. Overtime hours are those worked in excess of 40 hours in a week. 

Reg_Pay: This is the amount of pay the employee received for the Reg_Hrs worked during the time period. This amount is for each job code.

OT_Pay: This is the amount of overtime pay the employee received for the OT_Hrs worked during the time period. This amount is for each job code.

Futa: This is the federal unemployment tax paid by the employer during the period. 

Suta: This is the state unemployment tax paid by the employer during the period. 

Fica_Medc: This is the tax removed for the Federal Insurance Contributions Act. This money is used to pay for Social Security and Medicaid. 

Work_Comp: This is the amount paid from employees’ paychecks to cover workers’ compensation insurance. 

File: Analytics_mindset_case_studies_2016payroll.csv

This file contains the output of data from the first sixth months of 2016. The data contains no dates, so the data is such that each row represents a unique combination of the employee’s name, project code, department identifier and location code. Thus, the 2016 data differs significantly from the 2015 data. The variables are defined as follows:

Project: This is defined the same as Job in the 2015 data.

Employee_ID: This is a new unique number created in the new database to recognize each unique employee.  

Employee_Number: This is defined the same as Emp_No in the 2015 data.

Employee_Name: This is the combination of the First_Name and Last_Name fields in the 2015 data.  

Hours: This is defined the same as Reg_Hrs in the 2015 data.

OT_Hours: This is defined the same as OT_Hrs in the 2015 data.

Gross_Wages: This is defined as the sum of Reg_Pay and OT_Pay in the 2015 data. 

SocSec_Medicare: This is defined the same as Fica_Medc in the 2015 data.

FUTA: This is defined the same as Futa in the 2015 data.

SUTA: This is defined the same as Suta in the 2015 data.

Work_Comp: This is defined the same as Work_Comp in the 2015 data.

Data

Hint
Accounts & FinanceStructured data and the unstructured data: A structured data is very highly specific and is usually stored in a predefined format. The unstructured data is a conglomeration of several varied types of data which are basically stored in their native formats. It basically says that the structured data takes advantage of schema-on-write. And, the unstructured data employs the sch...

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.