Assessment 3: Statistical Data Analysis Due date:Week 10Group/individual:GroupWord count / Time provided:2000 WordsWeighting:30%Unit Learning Outcomes:[ULO1], [ULO2], [ULO3], [ULO4] Assessment Details: The assessment task 3 is worth of 30% of the overall assessment in the unit. This assignment is a group work. Timeframe and Submission: The assessment must be uploaded no later than 11:59 pm on Sunday of Week 10 on the Canvas in assessment submission link. Unless approval for an extension is given on medical grounds (supported by a medical certificate) there will be a penalty of 10% of the maximum marks per calendar day for late submission of assignments. Although you will be provided with guidance with regard to addressing the assignment tasks, you will need to complete the tasks in your own time. Assessment Presentation Your answers must be presented in task number order and be clearly labelled with the appropriate task number. Answers to each task must start on a new page.Your assignment must be presented in Microsoft (MS) Word or pdf. Copy and paste any relevant Excel outputs to this document immediately before any relevant written answers to each task.If you are unfamiliar with the use of the MS Word Equations Editor, you may write algebraic/mathematical/statistical symbols and notation in neat handwritten form. Your answers must be clear. You must highlight relevant items on any required Excel outputs and make reference to them in your written answers.When asked to perform a manual calculation (i.e. the use of MS Excel is not specified) you must show all working. This must include intermediate steps where relevant. Failure to do so will result in a loss of marks.An Assessment Declaration is required and must be attached to the front of your assignment. The dataset included with this assignment is a random sample of 534 persons from the population survey of a US state (say, California) in a certain year (say, 2012). The population consists of individuals in the said US state who were working and drawing wages during the survey year, which you can access from the Assessment Information page on the unit website. You need to select the random samples of 60 IDs each containing observations, where appropriate, of the eight variables V1 to V8. The variables in the data set are as follows: V1 = Wage (dollars per hour) V2 = Occupational category (1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other) V3 = Sector (0=Other, 1=Manufacturing, 2=Construction) V4 = Indicator variable for union membership (1=Union member, 0=Not union member) V5 = Number of years of education V6 = Number of years of work experience V7 =Age (years) V8 = Indicator variable for sex (1=Female, 0=Male). Assessment Tasks Answers to the Assessment 3 tasks must be based on the sample data file that you created in Part I of the assignment. Most tasks in the assessment task 3 require you to obtain an Excel output prior to performing some analysis. There are five tasks in the Assessment 3. You must meet all task requirements to receive full marks. Task 1 (20 marks) Find the frequency distribution for the Occupational category (1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other). Use Excel to produce a Descriptive Statistics table for your sample “Occupational category” data and paste into your MS Word assignment document.Use the relative frequency approach to find the probability distribution for the Occupational category.Draw the bar chart for the probability distribution of Occupational category.Define the probability distribution based on part (b), for example (You have to calculate according to your data) x123456P(x)0.140.260.30.150.080.07 Based on the probability distribution calculate the following Find the probability of exactly twoFind the probability more than twoFind the probability at least three Task 2 (20 marks) Find the frequency distribution for the Indicator variable for union membership (1=Union member, 0=Not union member). Use Excel to produce a Descriptive Statistics table for your sample “union membership” data and paste into your MS Word assignment document.Use the relative frequency approach to find the probability distribution for the union membership.Draw the bar chart for the probability distribution of union membership.Define the probability distribution based on part (b), for example (You have to calculate according to your data) x01P(x)0.540.46 Based on the probability distribution draw the bar chart.According to a report of the sample data, 46% (you need to consider the union member proportion as the probability of success) of the people have the union membership. Assume that a sample of 8 people is studied Find the probability of exactly twoFind the probability less than twoFind the probability at least six Task 3 (20 marks) Use Excel and your sample data file to produce a suitable output, to test, at the 1% level of significance, the hypothesis that, for Wages (dollar per hours) in the population with mean is $27.Is this a one-tailed or two-tailed test? Briefly explain the reasoning behind your answer.(c) Write, in precise symbolic form, the null and alternative hypotheses.Define Z or T test and also calculate the value of test statistics.Define critical values based on the nature of the problem.State the conclusion based on the sample evidence.Find 99% confidence interval for the Wages (dollar per hours) in the population.Reconsider this procedure at the 5% level of significance, the hypothesis that, for Wages (dollar per hours) in the population with mean is greater than $27.Make the decision based on the critical value.Find 95% confidence interval for the Wages (dollar per hours) in the population. Task 4 (20 marks) Use Excel and your sample data file to produce a descriptive summary output (remember to include confidence bound “e” at 5% level of significance), for Indicator variable for sex (1=Female, 0=Male) according to your sample data from task 1.Define the mean proportion.At 5% level of significance, the hypothesis that, for Indicator variable for sex (1=Female, 0=Male) according to your sample data from task 1 and the mean proportion for female population is 0.45.Write, in precise symbolic form, the null and alternative hypotheses.Is this a one-tailed or two-tailed test? Briefly explain the reasoning behind your answer.State the conclusion based on the sample evidence.Find 95% confidence interval for the Indicator variable for sex female. Task 5 (20 marks) Find the relationship between Wages (dollar per hours) as a response variable and number of years of work experience as an explanatory variable. Use excels to find the linear regression output. The belief is that as the work experience increases the wages (dollar per hours) would increase. (You have to calculate according to your data).State the slope coefficient of the least square regression equation.State the intercept coefficient of the least square regression equation.Determine the least square regression equation representing the approximate linear relationship between the Wages (dollar per hours) as a response variable and Number of years of work experience as an explanatory variable.Estimate the Wages when the work experience is 25 years.Construct the 95% confidence interval for the slope parameter of the least square regression equation. Marking Information: The case study assessment will be marked out of 100 and will be weighted 30% of the total unit marks. Marking CriteriaNot satisfactory (0-49%) of the criterion mark)Satisfactory (50-64%) of the criterion markGood (65-74%) of the criterion markVery Good (75-84%) of the criterion markExcellent (85-100%) of the criterion markTheoretical understanding of statistical data analysis (20 marks)All the tasks are not interpreted, and questions are not correctly answered.Some questions are correctly answered but most questions are partially correct.Majority of the questions are correctly answered but significance of the result is not explainedMajority of the questions are correctly answered, and significance of the result is explainedAll questions are correctly answered, and significance of the result is well explained to show its practical relevance.Problem set-up in Excel (30 marks)Fail to set-up problem correctly in excel.Statistical data analysis is correctly set-up, but all other relevant information pertaining to analysis are missing.Statistical data analysis is correctly set-up in excel with most of the of correct & relevant information pertaining to the variables and fail to present relevant calculations.Statistical data analysis is correctly set-up in excel with majority of correct & relevant information pertaining to the variables and fail to present relevant calculations.Statistical data analysis is correctly set-up in excel with all correct & relevant information pertaining to decision variables and constraint. All relevant formulas are shown with correct syntax.Simulation and result (40 marks)No simulation is performedSome part of simulation is correctMost part of the simulations are correct.Correct simulation but minor error in result.Excellent simulation with correct result.Results interpretation (10 marks)No simulation is performed and therefore no interpretation is provided.Average interpretation of results; no use of relevant statistical terminologies; fail to show the implication of result on any application.Good interpretation of results using relevant data analysis but fails to show the implication of result on any application.Interpretation of results is well presented using relevant statistical terminologies but fails to show the implication of result on any application.Excellent interpretation of results using relevant statistical terminologies; show the implication of result on any application.

