Image for post
Image for post

Part 3 of 3, Creating a Regression Model in Python

Introduction

Using the past to predict the future! Say hello to part 3 of 3 in this series on regression modeling with python! In blog 1, I covered the important processing steps prior to creating a linear regression model. In blog 2, I showed you how to create the actual regression model along with demonstrating how to reuse the model with new data. In this blog I will cover how to check for post-linear model creation assumptions: Homoscedasticity & Normality in Residuals.

For a more thorough overview of the project related to this series of blogs see: https://github.com/rgpihlstrom/Phase2Project


Image for post
Image for post

Part 2 of 3, Creating a Regression Model in Python

Introduction

Using the past to predict the future! Say hello to part 2 of 3 in this series on regression modeling with python! In blog 1, I covered the important processing steps prior to creating a linear regression model. In this blog I build from that foundation by creating the actual regression model along with demonstrating how to reuse the model with new data. In blog 3 I will cover how to check for the post-linear model creation assumptions.

For a more thorough overview of the project related to this series of blogs see: https://github.com/rgpihlstrom/Phase2Project


Image for post
Image for post

Part 1 of 3, Creating a Regression Model in Python

Introduction

Using the past to predict the future! Say hello to Regression Modeling! In this three-part series I will show you how to create, use, and check the validity of a regression model with python. To effectively cover the topic, I have broken the topic into the following parts.

Blog 1 (this blog)

  1. Overview of Data/ Understand the Business Objectives
  2. Processing your Data
    a. Load/ Study/ Cleanse Data
    b. Review Data Types, Convert Categorical to Dummies (Removed here for sake of brevity)
    c. Check For & Remove Extreme Outliers
    d. Ensure Linearity and Check/ Remove Multicollinearity
    e. Review/ Transform Distribution of Target Variable
    f. Transform/ Scale Feature Data (if Required)
    g. …


Getting Presentation Ready Formats with Aggregate Functions

Image for post
Image for post

Introduction

In a previous post I showed how to use .groupby() with .agg()to summarize large amounts of data (see here). Furthermore, I provided a solution to formatting the output of .agg() functions which can be tricky for the Python beginner given the additional “layer” aggregate functions create when displaying the output. In this post I will provide a solution to a related problem, formatting the output of an aggregate function when you are looking to add different formats to outputs that resides in the same column or row. For the old excel pro this seems like nothing to write about; however, given Python lacks the point and click functionality offered in Excel, accessing individual elements within a dataframe can be challenging. I will demonstrate this dilemma by introducing you to the function that was giving me the formatting fits: df.describe(). Before I start I will remind you of the target audience for this blog. This blog along with my previous posts are targeted toward the beginner. Furthermore, the solution proposed below, to me seems like genius, but may seem like painting by the numbers to more advance data alchemists. As a side not during the process of developing my work around I felt like doing what the guy above is doing 😊. …


A Gentle Introduction to Dataframes — Part 2of 3

…Learning My First Trick

Image for post
Image for post
Transforming Data To Gold

Introduction

In my previous post I introduced you to some of the basics when viewing, cleaning and transforming your data using Dataframes (see post). In this post I go a step further by showing you how to summarize your data using .groupby(). Like my previous posts, this post is for the beginner, perhaps an old Excel pro looking to make the jump from Excel to Python and needing a gentle introduction to Dataframes. Making the transition from Excel to Python or incorporating Python into your analytic repertoire can be daunting. Mastering Python Dataframes is the right first step in this journey. In addition to showing you how to summarize your data using Dataframes, I will also show you a few ways to optimize the viewing of your Dataframes within your jupyter notebook. During my first few weeks of using jupyter notebooks, I became frustrated by the default display behavior of jupyter. Below are the specifics. Nevertheless, this frustration led me to researched how I could modify the default behavior to better suit my needs. I share my findings with you along with demonstrating how to summarize your data. Without further ado….Data …


A Gentle Introduction to Dataframes — Part 1 of 3

…Learning My First Trick

Introduction

Image for post
Image for post
Turning Data To Gold

In my previous post I covered my background and goals for Data Science Bootcamp… Becoming a Data Alchemist (see post). Now I am going to introduce you to Python and one of its core tools for working with data, the Dataframe. Having been schooled in C.P.G. Marketing in Corporate America, I have gained a certain degree of bias for Excel and performing analysis the “Excel way”. However, in bootcamp, it’s the “Python way” or the highway! As I am learning this new environment, I am finding myself constantly comparing the keystrokes required to perform certain tasks in Excel vs. the same and or similar tasks in Python. As I continue to gain proficiency, I am beginning to see the power of Python. Certainly, the learning curve for Python is higher than that of Excel, but as I am discovering the effort put forth in learning the “Python way” is paying off in creative flexibility. Before I get “too fancy” with my tutorials I am going to start with the basics. …


Image for post
Image for post
Using Data To Drive Innovation

Introduction

Recently I enrolled in Flatirons Data Science Bootcamp. My learning goals for the program are simple: Data Alchemy: The magical process of transforming data into insights.

My background is in Corporate America, C.P.G Marketing, with a heavy focus on leading teams in New Product Innovation. I have my M.B.A in Marketing and have had several analysts related responsibilities.

Why Data Science?

Innovation! Specifically, using data to drive innovation! Data and Innovation are two words that are rarely combined, but to me this combination is the next frontier in innovation. How do companies and or inventors take the overwhelming amounts of available data and build a structure that fuels the consistent transformation of data it into an insight that leads to Disruptive Innovation. This is my goal! …

About

Russell Pihlstrom

Innovation Leader and Insight Enthusiast !

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store