Your collaborator is interested in mapping genetic loci that can affect height
Ask Expert

Be Prepared For The Toughest Questions

Practice Problems

Your collaborator is interested in mapping genetic loci that can affect height

Your collaborator is interested in mapping genetic loci that can affect height in humans. They know there are loci scattered throughout the genome that can affect height, but they do not know the locations of these loci, so they have performed a GWAS experiment and they would like you to perform the analysis. They have collected data for a number of individuals sampled from a population and they have provided you scaled height phenotypes and SNP genotypes in two files (“midterm phenotypes.txt” and “midterm genotypes.txt”). Note that for each of the SNPs, there are two total alleles, i.e. two letters for each SNP and there are three possible states per SNP genotype: two homozygotes and a heterozygote. In the “genotypes” file, each column represents a specific SNP (column 1 = genotype 1, column 2 = genotype 2) and each consecutive pair of rows represent all of the genotype states for an individual for the entire set of SNPs (rows 1 and 2 = all of individual 1’s genotypes, rows 3 and 4 = all individual 2’s genotypes). Also note that the genotypes in the file are listed in order along the genome such that the first genotype is ‘genotype 1’ and the last is ‘genotype N ’.

1. (a) Import the scaled height data from the file “midterm phenotypes.txt” and report the sample size n. (b) Produce a histogram of the height phenotype data (label your plot and your axes using informative names!). (c) Import the genotype data from the file “midterm genotypes.txt” and report the number of genotypes N.

2. Using the phenotype and genotype data: (a) For EACH of the N genotypes, calculate the MLE(βˆ) for the three β parameters when when applying a genetic linear regression model (with NO covariates!!). NOTE (!!): in your linear regressions, DO use the Xa and Xd codings provided in class and DO calculate the MLE(βˆ) using the formula provided in class (i.e. your R code must include the formula for the MLE). (b) Plot a histogram for the N estimates of each parameter (i.e. your answer will be three histograms, one each for the estimates of the βˆ µ’s, βˆ a’s, and βˆ d 0 s). (c) Why does it make sense that most of the βˆ a and βˆ d values are relatively close to zero? 

3. Using the phenotype and genotype data, for each genotype, calculate p-values for the null hypothesis H0 : βa = 0 ∩ βd = 0 versus the alternative hypothesis HA : βa 6= 0 ∪ βd 6= 0 when applying a genetic linear regression model (again NO covariates!). NOTE (!!): in your linear regressions, DO use the Xa and Xd codings provided in class and DO NOT use the function lm() (or any other R function!) to calculate your p-values but rather use the formula for MLE(βˆ) provided in class (i.e., use your code and / or results from question [2]!), calculate the predicted value of the phenotype ˆyi for each individual i under the null and alternative and use these to calculate SSM and SSE, and use the formulas for MSM and MSE to calculate the F-statistic, although you may use the function pf() to calculate the p-value for each F-statistic you calculate.

4. Produce a Manhattan plot from the output of question [3] (label your plot and your axes using informative names!).

5. (a) Plot a histogram of the p-values you calculated in question [3] (i.e., not the -log p-values (!!) just plot a histogram of the p-values). (b) What is a possible interpretation of why this histogram deviates from a uniform distribution?

6. (a) Assuming a Type 1 error for an individual test of α = 0.05, calculate and provide the Bonferroni corrected Type 1 error for the entire GWAS analysis of the N genotypes. (b) Define Type 1 error and explain why your Bonferroni correction results in a lower overall Type I error compared to a case where you just used α = 0.05 to assess significance (use no more than two sentences in your answer). (c) Define Type II error and explain why your Bonferroni correction increases the Type II error compared to a case where you just used α = 0.05 (use no more than two sentences in your answer). (d) Define power and explain why your Bonferroni correction decreases power compared to a case where you just used α = 0.05.

7. (a) Provide a list of ALL genotype markers that have p-values that are considered significant by your Bonferroni corrected Type 1 error calculated in question [6] (remember: genotypes in the genotype file are in order from 1 to N!). (b) For the TWO most significant genotypes you identified, explain whether you believe these two genotypes are indicating the location of the same causal genotype and why or why not.

8. Your collaborator needs some help interpreting the results of your analysis. Answer the following questions: (a) What is the definition of a p-value? (b) What is the definition of a causal polymorphism? (c) Even if you have measured a causal polymorphism in your GWAS, why might it NOT be possible to precisely identify which polymorphism is causal in a GWAS analysis (use no more than two sentences in your answer)? (d) What is an example of an ideal experiment (which need not be realistic to perform!) that would unequivocally demonstrate that a specific genotype is causal for a given phenotype?

9.  Say you conduct another GWAS analysis of 1000 genotypes and find that at a type I error rate of α = 0.05 you reject the null hypothesis 200 times. What is the False Discovery Rate (FDR) at this level of alpha?

10. What are the two conditions necessary for population structure to produce false positives in a GWAS analysis if you DO NOT include a covariate in your analysis (i.e., you apply the genetic linear regression to your total GWAS data WITHOUT any covariates to account for population structure)?

readablecorrectmidtermphenotypes-1

readablecorrectmidtermgenotypes-2

2021qgmidterm

Hint
Computer A histogram denotes graphical representation of a frequency distribution by using rectangles. It is used for variables that have numerical values and which are measured on an interval scale. Generally, it is applied when dealing with data sets with more than a hundred observations....

Know the process

Students succeed in their courses by connecting and communicating with
an expert until they receive help on their questions

1
img

Submit Question

Post project within your desired price and deadline.

2
img

Tutor Is Assigned

A quality expert with the ability to solve your project will be assigned.

3
img

Receive Help

Check order history for updates. An email as a notification will be sent.

img
Unable to find what you’re looking for?

Consult our trusted tutors.

Developed by Versioning Solutions.