Prediction Using Supervised Machine Learning Algorithms step by step


Prediction Using Supervised Machine Learning Algorithms Step By Step

Introduction :

Hey guys, in this article i am share my one internship Machine learning project code, which is Prediction using Supervised Machine Learning Algorithms. so let’s see :

Prediction Using Supervised Machine Learning Algorithms Step By Step
Prediction Using Supervised Machine Learning Algorithms Step By Step

Project Task : To predict the percentage of a student based on the number of study Hours

Step 01) We need to first installed all required library

pip install seaborn
Requirement already satisfied: seaborn in /srv/conda/envs/notebook/lib/python3.6/site-packages (0.11.0)
Requirement already satisfied: numpy>=1.15 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from seaborn) (1.19.4)
Requirement already satisfied: matplotlib>=2.2 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from seaborn) (3.3.3)
Requirement already satisfied: scipy>=1.0 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from seaborn) (1.5.3)
Requirement already satisfied: pandas>=0.23 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from seaborn) (1.1.4)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from matplotlib>=2.2->seaborn) (2.4.7)
Requirement already satisfied: pillow>=6.2.0 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from matplotlib>=2.2->seaborn) (8.0.1)
Requirement already satisfied: python-dateutil>=2.1 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from matplotlib>=2.2->seaborn) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.6/site-packages/cycler-0.10.0-py3.6.egg (from matplotlib>=2.2->seaborn) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from matplotlib>=2.2->seaborn) (1.3.1)
Requirement already satisfied: pytz>=2017.2 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from pandas>=0.23->seaborn) (2020.4)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.6/site-packages (from python-dateutil>=2.1->matplotlib>=2.2->seaborn) (1.15.0)
Note: you may need to restart the kernel to use updated packages.

step 2: Importing the required Library

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

step 3: We need to read or import the dataset

(Note : In this article we will use the data sets of the sparks foundation )

Sparkurl= "https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv"
data= pd.read_csv(Sparkurl)
data.head(10)
       Hours Scores
0	2.5	21
1	5.1	47
2	3.2	27
3	8.5	75
4	3.5	30
5	1.5	20
6	9.2	88
7	5.5	60
8	8.3	81
9	2.7	25

step 4: Visualization and analysis of data

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Hours   25 non-null     float64
 1   Scores  25 non-null     int64  
dtypes: float64(1), int64(1)
memory usage: 528.0 bytes
data.describe()

Hours	Scores
count	25.000000	25.000000
mean	5.012000	51.480000
std	2.525094	25.286887
min	1.100000	17.000000
25%	2.700000	30.000000
50%	4.800000	47.000000
75%	7.400000	75.000000
max	9.200000	95.000000
sns.heatmap(data.corr(),linewidth=1)
<AxesSubplot:>
Heatmap of the dataset

After heatmap the next step is plotting the distribution of score

data.plot(x='Hours',y='Scores',style='o')
plt.title('Hours vs Percentage')
plt.xlabel('Hours')
plt.ylabel('Percentage')
plt.show()
plotting distribution of score

step 5 : Preparing the data

x=data.iloc[: , :-1].valuesdata:image/png
y=data.iloc[: ,1].values
x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.2,random_state=0)

step 6: Train the Algo. data

regressor=LinearRegression()
regressor.fit(x_train, y_train)
line= regressor.coef_*x+regressor.intercept_
plt.scatter(x,y)
plt.plot(x,line, color= 'Red')
plt.show()
LinearRegression

Step 7: Making Prediction of given data set

print(x_test)
y_pred= regressor.predict(x_test)

[[1.5]
[3.2]
[7.4]
[2.5]
[5.9]]

step 8: Compaire Actual vs Predict

data1= pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
data1

Actual Predicted
0 20 16.884145
1 27 33.732261
2 69 75.357018
3 30 26.794801
4 62 60.491033

Step 9: Testing the data set

hours= 9.25
test= np.array([hours])
test= test.reshape(-1,1)
ownpred= regressor.predict(test)
print("Total number of hours= {}".format(hours))
print("Total PredictScore= {}".format(ownpred[0]))

Total number of hours= 9.25
Total PredictScore= 93.69173248737539

Step 10: And last one final step is Evaluating the model

from sklearn import metrics
print('Mean Absolute Error:',metrics.mean_absolute_error(y_test,y_pred))
print('Mean Suared Error:', metrics.mean_squared_error(y_test,y_pred))
print('Root mean squared Error', np.sqrt(metrics.mean_squared_error(y_test,y_pred)))

Mean Absolute Error: 4.183859899002982
Mean Suared Error: 21.598769307217456
Root mean squared Error 4.647447612100373

ohhh Finally I have done my first Task project :To predict the percentage of a student based on the number of study Hours¶

Summary :

In this article we saw How to To predict the percentage of a student based on the number of study Hours¶so about this section you have any query then free to ask me.

BEST OF LUCK!!!

I am Mr. Sachin pagar the founder of Pythonslearning, a Passionate Educational Blogger and Author, who love to share the informative content on educational resources.

Have any Question or Comment?

Leave a Reply

Your email address will not be published. Required fields are marked *