|
| 1 | +# Logistic Regression |
| 2 | + |
| 3 | + |
| 4 | +``` python |
| 5 | +from sklearn.datasets import make_classification |
| 6 | + |
| 7 | +X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, random_state=42) |
| 8 | +``` |
| 9 | +Above, is the custom dataset made using `make_classification` from |
| 10 | +`sklearn.datasets` . |
| 11 | + |
| 12 | +``` python |
| 13 | +import matplotlib.pyplot as plt |
| 14 | +plt.scatter(X[:,0],X[:,1]) |
| 15 | +plt.show() |
| 16 | +``` |
| 17 | + |
| 18 | + |
| 19 | + |
| 20 | + |
| 21 | +Logistic Regression is a statistical method used for binary |
| 22 | +classification problems. It models the probability that a given input |
| 23 | +belongs to a particular category. |
| 24 | + |
| 25 | +Logistic Function (Sigmoid Function): The core of logistic regression is |
| 26 | +the logistic function, which is an S-shaped curve that can take any |
| 27 | +real-valued number and map it into a value between 0 and 1. The function |
| 28 | +is defined as: |
| 29 | + |
| 30 | +$$\sigma(x) = \frac{1}{1 + e^{-x}}$$ |
| 31 | + |
| 32 | +where $x$ is the input to the function |
| 33 | + |
| 34 | +Logistic Regression is generally used for linearly separated data. |
| 35 | + |
| 36 | +Logistic Regression cost function : |
| 37 | + |
| 38 | +$J(\beta) = - \frac{1}{m} \sum_{i=1}^{m} \left[ y_i \log(h_\beta(x_i)) + (1 - y_i) \log(1 - h_\beta(x_i)) \right]$ |
| 39 | + |
| 40 | +### Applications |
| 41 | + |
| 42 | +- **Medical Diagnosis**: Predicting whether a patient has a certain |
| 43 | + disease (e.g., diabetes, cancer) based on diagnostic features. |
| 44 | +- **Spam Detection**: Classifying emails as spam or not spam. |
| 45 | +- **Customer Churn**: Predicting whether a customer will leave a |
| 46 | + service. |
| 47 | +- **Credit Scoring**: Assessing whether a loan applicant is likely to |
| 48 | + default on a loan. |
| 49 | + |
| 50 | + |
| 51 | +``` python |
| 52 | +from sklearn.model_selection import train_test_split |
| 53 | +x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42) |
| 54 | +``` |
| 55 | + |
| 56 | +`X`,`y` are split into training and testing data using `train_test_split` |
| 57 | + |
| 58 | +``` python |
| 59 | +from sklearn.linear_model import LogisticRegression |
| 60 | + |
| 61 | +model = LogisticRegression() |
| 62 | +model.fit(x_train,y_train) |
| 63 | +y_pred = model.predict(x_test) |
| 64 | + |
| 65 | +from sklearn.metrics import accuracy_score |
| 66 | +accuracy_score(y_test,y_pred) |
| 67 | + |
| 68 | +``` |
| 69 | +Output: |
| 70 | + |
| 71 | + 1.0 |
| 72 | + |
| 73 | +Our model predicts data accurately. Hence the accuracy score is 1 . |
| 74 | + |
| 75 | +``` python |
| 76 | +import numpy as np |
| 77 | +x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 |
| 78 | +y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 |
| 79 | +xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), |
| 80 | + np.arange(y_min, y_max, 0.01)) |
| 81 | + |
| 82 | +# Predict the class for each grid point |
| 83 | +Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) |
| 84 | +Z = Z.reshape(xx.shape) |
| 85 | + |
| 86 | +# Plot decision boundary and data points |
| 87 | +plt.figure(figsize=(8, 6)) |
| 88 | +plt.contourf(xx, yy, Z, alpha=0.8, cmap='viridis') |
| 89 | +plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', marker='o', edgecolors='k') |
| 90 | +plt.xlabel('Feature 1') |
| 91 | +plt.ylabel('Feature 2') |
| 92 | +plt.title('Logistic Regression Decision Boundary') |
| 93 | +plt.show() |
| 94 | +``` |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | + |
| 99 | + |
| 100 | + |
0 commit comments