Skip to content

Commit 7f7d775

Browse files
Merge branch 'CodeHarborHub:main' into lc-sol-3105
2 parents d95610b + 8c295cd commit 7f7d775

File tree

7 files changed

+634
-5
lines changed

7 files changed

+634
-5
lines changed

docs/Machine Learning/Scikit-Learn.md

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# Scikit-Learn
2+
3+
> Unlock the Power of Machine Learning with Scikit-learn: Simplifying Complexity, Empowering Discovery
4+
5+
6+
**Supervised Learning**
7+
- Linear Models
8+
9+
- Support Vector Machines
10+
11+
- Data Preprocessing
12+
13+
1. Linear Models
14+
15+
The following are a set of
16+
methods intended for regression in which the target value is expected to
17+
be a linear combination of the features. In mathematical notation, if
18+
$\hat{y}$ is the predicted value.
19+
20+
$$
21+
\hat{y}(w, x) = w_0 + w_1 + \ldots + w_p
22+
$$
23+
24+
Across the module, we designate the vector w =
25+
$(w_0, w_1, \ldots, w_n)$ as `coef_` and $w_0$ as `intercept_`.
26+
27+
28+
- *Linear Regression*
29+
Linear Regression fits a linear model with coefficients w = $(w_0 ,w_1 ,
30+
...w_n)$ to minimize the residual sum of squares between the observed
31+
targets in the dataset, and the targets predicted by the linear
32+
approximation. Mathematically it solves a problem of the form:
33+
34+
$\min_{w} || X w - y||_2^2$
35+
36+
``` python
37+
from sklearn import linear_model
38+
reg = linear_model.LinearRegression() #To Use Linear Regression
39+
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
40+
coefficients = reg.coef_
41+
intercept = reg.intercept_
42+
43+
print("Coefficients:", coefficients)
44+
print("Intercept:", intercept)
45+
```
46+
47+
Output:
48+
49+
Coefficients: [0.5 0.5]
50+
Intercept: 1.1102230246251565e-16
51+
52+
53+
![LinearRegression](https://scikit-learn.org/stable/_images/sphx_glr_plot_ols_001.png)
54+
55+
This is how the Linear Regression fits the line .
56+
57+
58+
- Support Vector Machines
59+
Support vector machines (SVMs) are a set of supervised learning methods
60+
used for classification, regression and outliers detection.
61+
62+
*The advantages of support vector machines are:*
63+
64+
Effective in high dimensional spaces.
65+
66+
Still effective in cases where number of dimensions is greater than the
67+
number of samples.
68+
69+
Uses a subset of training points in the decision function (called
70+
support vectors), so it is also memory efficient.
71+
72+
Versatile: different Kernel functions can be specified for the decision
73+
function. Common kernels are provided, but it is also possible to
74+
specify custom kernels.
75+
76+
*The disadvantages of support vector machines include:*
77+
78+
If the number of features is much greater than the number of samples,
79+
avoid over-fitting in choosing Kernel functions and regularization term
80+
is crucial.
81+
82+
SVMs do not directly provide probability estimates, these are calculated
83+
using an expensive five-fold cross-validation (see Scores and
84+
probabilities, below).
85+
86+
The support vector machines in scikit-learn support both dense
87+
(numpy.ndarray and convertible to that by numpy.asarray) and sparse (any
88+
scipy.sparse) sample vectors as input. However, to use an SVM to make
89+
predictions for sparse data, it must have been fit on such data. For
90+
optimal performance, use C-ordered numpy.ndarray (dense) or
91+
scipy.sparse.csr_matrix (sparse) with dtype=float64
92+
93+
**Linear Kernel:**
94+
95+
Function: 𝐾 ( 𝑥 , 𝑦 ) = 𝑥 𝑇 𝑦
96+
97+
Parameters: No additional parameters.
98+
99+
**Polynomial Kernel:**
100+
101+
Function: 𝐾 ( 𝑥 , 𝑦 ) = ( 𝛾 𝑥 𝑇 𝑦 𝑟 ) 𝑑
102+
103+
Parameters:
104+
105+
γ (gamma): Coefficient for the polynomial term. Higher values increase
106+
the influence of high-degree polynomials.
107+
108+
r: Coefficient for the constant term.
109+
110+
d: Degree of the polynomial.
111+
112+
**Radial Basis Function (RBF) Kernel:**
113+
114+
Function: 𝐾 ( 𝑥 , 𝑦 ) = exp ⁡ ( − 𝛾 ∣ ∣ 𝑥 − 𝑦 ∣ ∣ 2 )
115+
116+
Parameters: 𝛾 γ (gamma): Controls the influence of each training
117+
example. Higher values result in a more complex decision boundary.
118+
119+
**Sigmoid Kernel:**
120+
121+
Function: 𝐾 ( 𝑥 , 𝑦 ) = tanh ⁡ ( 𝛾 𝑥 𝑇 𝑦 𝑟 )
122+
123+
Parameters:
124+
125+
γ (gamma): Coefficient for the sigmoid term.
126+
127+
r: Coefficient for the constant term.
128+
129+
130+
``` python
131+
import numpy as np
132+
import matplotlib.pyplot as plt
133+
from sklearn import svm, datasets
134+
135+
# Load example dataset (Iris dataset)
136+
iris = datasets.load_iris()
137+
X = iris.data[:, :2] # We only take the first two features
138+
y = iris.target
139+
140+
# Define the SVM model with RBF kernel
141+
C = 1.0 # Regularization parameter
142+
gamma = 0.7 # Kernel coefficient
143+
svm_model = svm.SVC(kernel='rbf', C=C, gamma=gamma)
144+
145+
# Train the SVM model
146+
svm_model.fit(X, y)
147+
148+
# Plot the decision boundary
149+
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
150+
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
151+
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
152+
np.arange(y_min, y_max, 0.02))
153+
Z = svm_model.predict(np.c_[xx.ravel(), yy.ravel()])
154+
Z = Z.reshape(xx.shape)
155+
156+
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
157+
158+
# Plot the training points
159+
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
160+
plt.xlabel('Sepal length')
161+
plt.ylabel('Sepal width')
162+
plt.title('SVM with RBF Kernel')
163+
plt.show()
164+
```
165+
![SVM](https://github.com/AmrutaJayanti/codeharborhub/assets/142327526/24bc053e-54b6-4702-a442-d7f6e4b34332)
166+
167+
- Data Preprocessing
168+
Data preprocessing is a crucial step in the machine learning pipeline
169+
that involves transforming raw data into a format suitable for training
170+
a model. Here are some fundamental techniques in data preprocessing
171+
using scikit-learn:
172+
173+
**Handling Missing Values:**
174+
175+
Imputation: Replace missing values with a calculated value (e.g., mean,
176+
median, mode) using SimpleImputer. Removal: Remove rows or columns with
177+
missing values using dropna.
178+
179+
**Feature Scaling:**
180+
181+
Standardization: Scale features to have a mean of 0 and a standard
182+
deviation of 1 using StandardScaler.
183+
184+
Normalization: Scale features to a range between 0 and 1 using
185+
MinMaxScaler. Encoding Categorical Variables:
186+
187+
One-Hot Encoding: Convert categorical variables into binary vectors
188+
using OneHotEncoder.
189+
190+
Label Encoding: Encode categorical variables as integers using
191+
LabelEncoder.
192+
193+
**Feature Transformation:**
194+
195+
Polynomial Features: Generate polynomial features up to a specified
196+
degree using PolynomialFeatures.
197+
198+
Log Transformation: Transform features using the natural logarithm to
199+
handle skewed distributions.
200+
201+
**Handling Outliers:**
202+
203+
Detection: Identify outliers using statistical methods or domain
204+
knowledge. Transformation: Apply transformations (e.g., winsorization)
205+
or remove outliers based on a threshold.
206+
207+
**Handling Imbalanced Data:**
208+
209+
Resampling: Over-sample minority class or under-sample majority class to
210+
balance the dataset using techniques like RandomOverSampler or
211+
RandomUnderSampler.
212+
213+
Synthetic Sampling: Generate synthetic samples for the minority class
214+
using algorithms like Synthetic Minority Over-sampling Technique
215+
(SMOTE). Feature Selection:
216+
217+
Univariate Feature Selection: Select features based on statistical tests
218+
like ANOVA using SelectKBest or SelectPercentile.
219+
220+
Recursive Feature Elimination: Select features recursively by
221+
considering smaller and smaller sets of features using RFECV.
222+
223+
**Splitting Data:**
224+
225+
Train-Test Split: Split the dataset into training and testing sets using
226+
train_test_split.
227+
228+
Cross-Validation: Split the dataset into multiple folds for
229+
cross-validation using KFold or StratifiedKFold.

docusaurus.config.js

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,24 @@ const config = {
8282
backgroundColor: "var(--ifm-color-primary)",
8383
},
8484

85+
metadata: [
86+
{
87+
name: "keywords",
88+
content:
89+
"CodeHarborHub, CodeHarbor, CodeHarborHub, CodeHarborHub Blog, CodeHarborHub Community, CodeHarborHub Courses, CodeHarborHub DSA, CodeHarborHub Web Dev, CodeHarborHub Tutorials, CodeHarborHub Showcase, CodeHarborHub Donate, CodeHarborHub Blog, CodeHarborHub Team, CodeHarborHub About, CodeHarborHub Contact, CodeHarborHub Careers, CodeHarborHub Terms, CodeHarborHub Privacy, CodeHarborHub Cookie, CodeHarborHub Code of Conduct, CodeHarborHub Quiz, CodeHarborHub Broadcast, CodeHarborHub Tags, CodeHarborHub Courses Tags, CodeHarborHub DSA Tags, CodeHarborHub Web Dev Tags, CodeHarborHub Product, CodeHarborHub LinkedIn, CodeHarborHub YouTube, CodeHarborHub Discord, CodeHarborHub Twitter, CodeHarborHub GitHub, CodeHarborHub Products, CodeHarborHub Web Dev, CodeHarborHub DSA, CodeHarborHub Courses, CodeHarborHub Tutorials, CodeHarborHub Showcase, CodeHarborHub Donate, CodeHarborHub Blog, CodeHarborHub Team, CodeHarborHub About, CodeHarborHub Contact, CodeHarborHub Careers, CodeHarborHub Terms, CodeHarborHub Privacy, CodeHarborHub Cookie, CodeHarborHub Code of Conduct, CodeHarborHub Quiz, CodeHarborHub Broadcast, CodeHarborHub Tags, CodeHarborHub Courses Tags, CodeHarborHub DSA Tags, CodeHarborHub Web Dev Tags, CodeHarborHub Product, CodeHarborHub LinkedIn, CodeHarborHub YouTube, CodeHarborHub Discord, CodeHarborHub Twitter, CodeHarborHub GitHub, CodeHarborHub Products, CodeHarborHub Web Dev, CodeHarborHub DSA, CodeHarborHub Courses, CodeHarborHub Tutorials, CodeHarborHub Showcase, CodeHarborHub Donate, CodeHarborHub Blog, CodeHarborHub Team, CodeHarborHub About, CodeHarborHub Contact, CodeHarborHub Careers, CodeHarborHub Terms, CodeHarborHub Privacy, CodeHarborHub Cookie, CodeHarborHub Code of Conduct, CodeHarborHub Quiz, CodeHarborHub Broadcast, CodeHarborHub Tags, CodeHarborHub, leetcode, codeforces, hackerrank, geeksforgeeks, interviewbit, educative, udemy, coursera, udacity, khanacademy, codecademy, w3schools, tutorialspoint, javatpoint, geeksforgeeks, stackoverflow, github, gitlab, bitbucket, codepen, jsfiddle, repl.it, codesandbox, stackblitz, gfg, GeeksForGeeks, tech",
90+
},
91+
{ name: "twitter:card", content: "summary_large_image" },
92+
{ name: "twitter:site", content: "@CodesWithAjay" },
93+
{ name: "twitter:creator", content: "@CodesWithAjay" },
94+
{ property: "og:type", content: "website" },
95+
{ property: "og:site_name", content: "CodeHarborHub" },
96+
{ property: "og:title", content: "CodeHarborHub - A place to learn and grow" },
97+
{ property: "og:description", content: "CodeHarborHub is a place to learn and grow. We provide accessible and comprehensive educational resources to learners of all levels, from beginners to advanced professionals."},
98+
{ property: "og:image", content: "https://codeharborhub.github.io/img/nav-logo.jpg" },
99+
{ property: "og:url", content: "https://codeharborhub.github.io" },
100+
{ name: "robots", content: "index, follow" },
101+
],
102+
85103
algolia: {
86104
apiKey: "2c1a3331ebff51f76d2f247323ee4ba4",
87105
indexName: "code-harbor-hub",
@@ -175,19 +193,19 @@ const config = {
175193

176194
{
177195
to: "/our-sponsors/",
178-
html: '<span class="nav-emoji">💰</span> Donate'
196+
html: '<span class="nav-emoji">💰</span> Donate',
179197
},
180198

181199
{
182200
to: "/blog",
183201
html: '<span class="nav-emoji">📰</span> Blog',
184202
},
185-
203+
186204
{
187205
type: "dropdown",
188206
html: '<span class="nav-emoji">🔗</span> More',
189207
position: "left",
190-
items: [
208+
items: [
191209
{
192210
html: '<span class="nav-emoji">🌍</span> Web Dev',
193211
to: "/web-dev/",

dsa-problems/leetcode-problems/0500-0599.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -506,7 +506,7 @@ export const problems =[
506506
"problemName": "599. Minimum Index Sum of Two Lists",
507507
"difficulty": "Easy",
508508
"leetCodeLink": "https://leetcode.com/problems/minimum-index-sum-of-two-lists",
509-
"solutionLink": "#"
509+
"solutionLink": "/dsa-solutions/lc-solutions/0500-0599/minimum-index-sum-of-two-lists"
510510
}
511511
]
512512

0 commit comments

Comments
 (0)