Linear Regression machine learning: part01
Welcome everyone, Today we will going to start New part of our course(Machine Learning). In this section we see first Regression algorithms.
This Regression Algorithm divided into three parts:
PART_01 Check out data and see all plots
PART_02 Training and Testing a Linear Regression Model
PART_03 Exercise and Solution
At the end of every post link are provided of the every part.
What is Regression?
Linear Regression in python search for relationship variables.you can observe several employees of the company and try to understand how their salaries depends on the feature.This is a regression problems where data related to each employee represent one observation. The presumption is that the experience, education, roles, and city are the independent feature, while the salary depend on them.
what is Linear Regression Machine Learning?Linear regressions is probably one of the most important and widely used regressions techniques. It is among the simplest regression method. One of its main advantages is the ease of interpreting result.
LET’S START WITH PROJECT WORK:
Your neighbor is a real estate agents and want some help predicting housing prices for regions in the USA. It would be great if you could somehow create a model for her that allow her to put in a few features of a house and returns back an estimate of what the houses would sell for.
He has asked you if you could help her out with your new data science skills. You say yes, and decide that Linear Regressions might be a good path to solve this problem!
Your neighbor then give you some information about a bunch of houses in region of the United States,it is all in the data set: USA_Housing.csv.
The data contains the following column:
- ‘Avg. Area Incomes’
- ‘Avg. Area House Ages’
- ‘Avg. Area Number of Room’
- ‘Avg. Area Number of Bedroom’
- ‘Area Populations’
- ‘Prices’
- ‘Addresses’
Check out the data
We have been able to get some data from your neighbor for housing prices as a csv set, let’s get our environment ready with the libraries we will need and then import the data.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
USAhousing =pd.read_csv('USA_Housing.csv')
USAhousing.head()
USAhousing.describe()
OUTPUT:
Avg. Area Incomes Avg. Area House Ages Avg. Area Number of Room Avg. Area Number of Bedroom Area Populations Prices count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5.000000e+03 mean 68583.108984 5.977222 6.987792 3.981330 36163.516039 1.232073e+06 std 10657.991214 0.991456 1.005833 1.234137 9925.650114 3.531176e+05 min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04 25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05 50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06 75% 75783.338666 6.650808 7.665871 4.490000 42861.290769 1.471210e+06 max 107701.748378 9.519088 10.759588 6.500000 69621.713378 2.469066e+06
Let's create some simple plot to check out the data.
OUTPUT:
sns.pairplot(USAhousing)