Data analysis project: final report
Assignment Description and Details
Please answer, briefly, each of the following questions in your write-up. Answers for numbers 1 – 3 were the part of your proposal.
Description of the problem: Explain the background. Why does this project interest you?
Definition of the study unit and the target population
Definition of variables
The dependent variable for prediction
The list of independent variables (at least two)
Explanation of the sampling method
Simple Linear Regression analysis: For each independent variable, develop a simple regression to predict the dependent variable. The statistical analysis should include the following:
Scatterplot with the regression line [ you already did scatterplot]
Regression line equation
Discussion of slope of regression line and its meaning
Value of
R
and interpretation of its meaning
Value of
R 2
and interpretation of its meaning
Significant of the coefficients with discussion of significance level
The residual vs. predicted graph for the best predictor regression. Explain the implications.
Which is the best predictor according to statistical analysis? State your reasoning.
Multiple Regression: Use all of your predictor variables and run multiple regression. The statistical analysis should include the following:
Regression line equation
Value of Adjusted
R 2
and interpretation of its meaning. Is it appreciably higher than what you got in the simple regressions?
Significant of the coefficients with discussion of significance level. Which variables (if any) appear to be useless for predicting the response variable?
Significant test of F – statistics and interpret.
Make sure to check the residual plot to verify the model assumptions for the best fit model.
If you detected nonlinear relationships in part #4, you may want to try models that include the second-order terms of the quantitative predictor variables to see if it helps improve the model fit. If one of the coefficients are not significant, remove the predictor variable and re-run the multiple regression analysis to see if it helps the model fit. Use the adjusted R2 to determine the best fit model.
Conclusions
What are possible applications of the regression that you developed? What have you learned about issues related to your problem? What should be studied further, if anything?
Appendix: Attached your data and excel analysis file
Use the attached proposal to assist. This is due August 6 2018 7:00 pm eastern standard time.