Special Topics - Project 3 Weeks 2 & 3
For the analyze portion of this project we ran numerous iterations of the Ordinary Least Squares (OLS) regression starting with 29 independent variables and removing one with each iteration. To
determine which variable to remove the first three of six checks were performed in combination. Each variable was tested for Probability, Value Inflation Factor (VIF), Coefficient, and Importance. If the answer to all these questions were "No" the OLS was repeated with that variable removed. If just one question returned a "Yes" the variable remained. This process was to be repeated until either all the variable that failed the test were removed or the Adjusted R-Squared value was as high as it could get. The Adjusted R-Squared value was expected to come in between 0.0-1.0, but during this process mine seldom came close.
determine which variable to remove the first three of six checks were performed in combination. Each variable was tested for Probability, Value Inflation Factor (VIF), Coefficient, and Importance. If the answer to all these questions were "No" the OLS was repeated with that variable removed. If just one question returned a "Yes" the variable remained. This process was to be repeated until either all the variable that failed the test were removed or the Adjusted R-Squared value was as high as it could get. The Adjusted R-Squared value was expected to come in between 0.0-1.0, but during this process mine seldom came close.
Once all the variables were removed the next step was to check the Jarque-Bera Statistic score to check for bias. If the p-value was less than 0.05 and has an asterisk next to it the model is biased. To analyze the data to find the skewed results a Scatterplot Matrix graph was created. This graph also included a histogram of each variable provided a second way to view the data. For each variable that was skewed the OLS was run again with that variable removed and the Jarque-Bera score checked for improvement.
Check 5 was the first time we viewed the map for results. A part of the OLS process is the creation of a layer of Standard Residual values. The Standard Residual categorized the residual values making them comparable between different models. The residual is the difference between the density of meth labs the model predicted would be in a census tract and the density that actually exists.
Check 6 was to see how well the model was predicting the dependent variable. It seemed okay to me.
No comments:
Post a Comment