Data Analysis Report | My Assignment Tutor

STAT 578 { Advanced Bayesian Modeling { Spring 2021Data Analysis ReportDUE: May 9, 2021You will submit a PDF file containing your data analysis report, which must follow the formatdescribed below.Important: You may not collaborate or discuss your analysis with anyone else.Plagiarism from any source is an academic integrity infraction.Scenario: Rivera & Rosenbaum (2020)1 discuss data evidence for racial bias in police stops intwo US cities. One assessment method they use is the outcome test, in which the proportion ofpolice searches (after a stop) that find contraband is compared by race of the subject searched. Ifsubjects of a US minority race are less likely to be found with contraband after a search, relativeto white subjects, racial bias may have influenced the police decision to stop or to conduct thesearch.Data file policesearchSanFrancisco.csv contains data aggregated from original data files usedby Rivera & Rosenbaum (2020). It records frequencies and outcomes of police searches conductedin San Francisco from January 1, 2015, through June 30, 2016. Each row represents a combinationof the searched subject’s race and the reporting police district. The columns are as follows: SubjectRaceracial grouping of searched subject (API=asian/pacific islander)Districtdesignation for police district reporting the searchContrabandFoundnumber of searches (out of the total) in which contraband was foundTotalSearchestotal number of searches conducted (subsequent to police stops) Use JAGS and R software, and use only the data in policesearchSanFrancisco.csv. JAGScode should be included in the appropriate sections, but all R code and any direct R textoutput listings you choose to include should be in the Appendix only.Your report must be neatly typed and can be at most 8 pages, excluding the Appendix. It mustfollow this outline:1. Introduction Provide brief background information about police stops and searches inthe US, and the issue of racial bias in US policing. (Use footnotes to acknowledge allsources you consult, including web sites.) Do not plagiarize!2. Data Briefly describe the variables in policesearchSanFrancisco.csv. For each subjectracial group (including Other), produce a boxplot of the district-level raw proportions ofsearches that find contraband. (Omit any proportions that cannot be calculated.) Displayall five boxplots side-by-side on the same graph (same axis), so they can be compared.2Answer the following questions:1Rivera, R., & Rosenbaum, J. (2020, August). Racial disparities in police stops in US cities. Significance, 17(4),04{05.2Consider using the R function boxplot. Consult its R help file for assistance.1• In the data, which race/district combinations have no searches?• For which racial group are the proportions of searches that find contraband generallythe largest?• For which racial group are the proportions of searches that find contraband generallythe smallest?3. First Model You will use the JAGS model in the file named firstmodel.bug. Thedata-related nodes are as follows:• found: a node array containing the numbers of searches that resulted in findingcontraband (for each race/district combination)• searches: a node array containing the total numbers of searches conducted (for eachrace/district combination)• race: a node array, each element containing an integer index from 1 to 5 indicating therace group of the subject (for each race/district combination)• district: a node array, each element containing an integer index from 1 to 12indicating the reporting district (for each race/district combination)Carefully set up the R data structure that you will pass to JAGS.34 Then run your analysis(being careful to follow the usual procedures) and report as follows:(a) Describe the model in firstmodel.bug. Make sure that the following questions areanswered by your description:• What type of (generalized) linear model is this? What does the response variablerepresent?• What are the parameters and hyperparameters?• Are the racial group parameters treated as if they are fixed effects or randomeffects? What about the district parameters?(b) List the JAGS code in firstmodel.bug.(c) Summarize the details of your computation, including number of chains, length ofburn-in, number of iterations used per chain, any thinning (if used), and effectivesample sizes of the top-level parameters. You should use plots to check convergence,but do not include them in your report.Note: Use overdispersed starting values, but make them less extreme if you encounterconvergence problems.(d) Graph an approximate posterior density (not a histogram) for sigmadistrict. Doesyour graph suggest that there are actual differences among the districts (in terms ofthe probability of a search finding contraband)?3You can convert SubjectRace and District to factor variables in R, if they are not so already. Then you canconvert each factor variable to an integer index by applying the unclass function. When interpreting results, it isup to you to figure out which factor level corresponds to which integer index.4You may omit any row of the data set for which the total number of searches is zero, since such rows will notcontribute to the likelihood function. While you can run the JAGS model even without omitting those rows, omittingthem may make the rest of your analysis easier to perform.2(e) Let βB be the coefficient associated with the subject race being Black and βW thecoefficient associated with the subject race being White. Briefly explain why βB < βWwould indicate that contraband is less likely to be found in a search of a Black subjectthan of a White subject (within a given district). Then approximate the posteriorprobability that βB < βW . What do you conclude?(f) Check the model for overdispersion: Approximate the posterior predictive p-valuebased on using the chi-square discrepancy. What do you conclude?(g) Approximate the value of (Plummer’s) DIC and its associated effective number ofparameters. Compare the effective number of parameters with the actual (total)number of parameters (including hyperparameters).4. Second Model Starting with the JAGS model in firstmodel.bug, create an extendedJAGS model that can account for overdispersion:• Add a random effect term i to the linear portion of the model (where i ranges over theobservations, i.e., the rows of the data set).• Under the prior, let the random effects i be (conditionally) independent and have thesame normal distribution: one that has mean 0 and variance σ2 .• Let the hyperprior for σ (not σ2) be uniform from 0 to 10.• Do not change anything related to the other aspects of the model.Run your analysis (being careful to follow the usual procedures) and report as follows:(a) List all of the JAGS code for your extended model.(b) Summarize the details of your computation, including number of chains, length ofburn-in, number of iterations used per chain, any thinning (if used), and effectivesample sizes of the top-level parameters. You should use plots to check convergence,but do not include them in your report.Note: Use overdispersed starting values, but make them less extreme if you encounterconvergence problems.(c) As you did for the previous model, consider the proposition that contraband is lesslikely to be found in a search of a Black subject than of a White subject (within agiven district). Approximate the posterior probability of this for your new model. Doesyour conclusion change?(d) Approximate the value of (Plummer’s) DIC and its associated effective number ofparameters. Is your second model better than the first, according to DIC?5. Conclusions Briefly summarize your results in a non-technical manner.6. Appendix Provide the R code you used to conduct your analysis. Include comments thatlabel the purpose of each block of code.NOTES:• Comma-separated variable (.csv) files can be read into R with read.csv.3• Effective sample sizes of at least 2000 are recommended for accuracy.• If your computer runs out of memory, consider using thinning (e.g., the thin argument ofcoda.samples).4Point AllocationsSpecifications2 neatly typed2 no more than 8 pages (excluding Appendix)Introduction4 background given1 sources acknowledgedData1 description of variables2 boxplots3 questionsFirst Model6 (a)1 (b)4 (c)2 (d)3 (e)3 (f)3 (g)Second Model4 (a)4 (b)2 (c)3 (d)Conclusions3 brief, clearly stated, appropriate summary of resultsAppendix2 all R code present2 comments for different blocks of codeTotal: 575

QUALITY: 100% ORIGINAL PAPER – NO PLAGIARISM – CUSTOM PAPER

Leave a Reply

Your email address will not be published. Required fields are marked *