Using The Past To Predict The Future

Introduction

3. Homoscedasticity in the Residuals

#Graph our residuals against our predictions, this will give us a sense if our model is off for certain priced homes
plt.scatter(df_predictions["price_Predicted"], df_Regression_No_Outliers_With_ScaledData["price_Residuals"])
plt.xlabel("Predicted Price")
plt.xticks(ticks=(300000, 500000, 700000, 900000, 1200000),labels= ('$300k', '$500k', '$700k', '$900k',$1.2M'))
plt.ylabel("Residual")
plt.yticks(ticks=(-300000, -100000, 0, 300000, 500000),labels= ('$-300k', '$-100k', '$0k', '$300k','$500k'))
plt.plot(df_predictions["price_Predicted"], [0 for i in range(len(df_predictions["price_Predicted"]))],color="r");
plt.show()
# het_breuschpagan suggests heteroscedasticity in the data P-value> .05, null = homoscedasticity vs. heteroscedasticity
from statsmodels.compat import lzip
import statsmodels.stats.api as sms
name = ['LM Statistic', 'LM-Test p-value', 'F-Statistic', 'F-Test p-value']
test = sms.het_breuschpagan(model.resid, predictors_int)
lzip(name,test)

4. Normality in Distribution of Residuals

# Import appropriate libraries
import statsmodels.api as sm
import scipy.stats as stats
import pylab
#Plot Histogram and QQP
fig,axes=plt.subplots(1,2)
sns.histplot(model.resid,ax=axes[0])
sm.graphics.qqplot(data=model.resid, dist=stats.norm, line='45', fit=True, ax=axes[1])
axes[0].set_title('Histogram of Residuals')
axes[0].set_xlabel("Residuals")
axes[1].set_title('QQP of Residuals')
axes[1].set_ylabel("Residual Z Scores")
pylab.show()

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Russell Pihlstrom

Russell Pihlstrom

Innovation Leader and Insight Enthusiast !