Task 1: Load the iris data from sklearn dataset.
Info!
The Iris Dataset related features are the dimensions of flowers petals belonging to iris species- The target names are the species to which the flower belongs. They are mapped as 0,1 and 2.
In this excercise you will perform logistic regression predict the species of flow given the petal dimensions as features- to view the data by printing iris_X and iris_Y (Optional)
from sklearn import datasets
iris = datasets.load_iris()
iris_X = iris.data
iris_y = iris.target
print(iris.feature_names)
print(iris.target_names)
'''
Output:-
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
['setosa' 'versicolor' 'virginica']
'''
print(iris_X[:5])
'''
Output:-
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2]])
'''
Task 2: Split the Dataset.
- 1. Import train_test_split function from sklearn.model_selection, and.
- 2. split the data into train and test set with test_size = 0.33 and random_state = 10
###Start code
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris_X, iris_y.astype(float), test_size = 0.33 , random_state = 101)
###End code
Task3: Fit the Regression Model.
- import LogisticRegression from sklearn
- Initialise logistic regression model and assign to variable ‘model’.
- 3. fit the model with train data(X_train and y_train).
###Start code
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
#fit the model
final_model = model.fit(X_train, y_train.astype(int))
###End code
Task 4: Using the model predict the output of test data (i.e) X_test.
###Start code
y_pred = final_model.predict(X_test)
###End code(approx 1 line)
Task 5: Print the Classification Report.
- import classification_report from sklearn
- Pass y_test and y_pred to classification_report().
- print the output of classification_report.
###Start code here
from sklearn.metrics import classification_report
ans = classification_report(y_test , y_pred)
print(ans)
###End code
# Precision Formula = TP/(TP + FP)
# Recall Formula = TP / (TP + FN)
# Task 6: Predict the new Dataset.
- 1. Predict the labels for the data stored in data/test_iris.csv.
- 2. Store the predictions in the form of list to the variable list_ans. Note: List numbers are of integer type.
import pandas as pd
df_test = pd.read_csv('data/test_iris.csv').to_numpy()
finanl_predictions = final_model.predict(df_test)
print(df_test[:5])
print(finanl_predictions)
'''
Output:-
array([[5.5, 4.2, 1.4, 0.2],
[5.4, 3.9, 1.3, 0.4],
[5. , 3.5, 1.6, 0.6],
[7.2, 3. , 5.8, 1.6],
[7. , 3.2, 4.7, 1.4]])
array([0, 0, 0, 2, 1, 2, 1, 1, 2, 0, 2, 0, 0, 2, 2, 1, 1, 1, 0, 2, 1, 0,
1, 1, 1, 1, 1, 2, 0, 0, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 0, 0, 0, 2,
1, 0, 2, 1, 0, 1]
Data Visualization (Optional): Print the Graph of area under the curv.
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve
from sklearn.metrics import auc
# Assume y_test (true labels) and y_clf (predicted probabilities/scores) are defined
# 1. Compute fpr, tpr, and thresholds
fpr, tpr, thresholds = roc_curve(y_test, y_clf)
# 2. Compute ROC AUC using the output of roc_curve (fpr and tpr)
roc_auc = auc(fpr, tpr)
# 3. Plot ROC curve
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (area = %0.3f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--') # Plotting the random classifier line
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC)')
plt.legend(loc="lower right")
plt.show()