Skip to content

Commit 103a290

Browse files
authored
misc (#262)
1 parent 1e8f135 commit 103a290

File tree

1 file changed

+22
-36
lines changed

1 file changed

+22
-36
lines changed

lectures/prob_dist.md

Lines changed: 22 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,6 @@ import scipy.stats
3939
import seaborn as sns
4040
```
4141

42-
+++ {"user_expressions": []}
4342

4443
## Common distributions
4544

@@ -100,7 +99,6 @@ n = 10
10099
u = scipy.stats.randint(1, n+1)
101100
```
102101

103-
+++ {"user_expressions": []}
104102

105103
Here's the mean and variance
106104

@@ -110,7 +108,6 @@ u.mean(), u.var()
110108

111109
The formula for the mean is $(n+1)/2$, and the formula for the variance is $(n^2 - 1)/12$.
112110

113-
+++ {"user_expressions": []}
114111

115112
Now let's evaluate the PMF
116113

@@ -122,7 +119,6 @@ u.pmf(1)
122119
u.pmf(2)
123120
```
124121

125-
+++ {"user_expressions": []}
126122

127123
Here's a plot of the probability mass function:
128124

@@ -135,7 +131,6 @@ ax.set_xticks(S)
135131
plt.show()
136132
```
137133

138-
+++ {"user_expressions": []}
139134

140135
Here's a plot of the CDF:
141136

@@ -148,20 +143,21 @@ ax.set_xticks(S)
148143
plt.show()
149144
```
150145

151-
+++ {"user_expressions": []}
152146

153147
The CDF jumps up by $p(x_i)$ and $x_i$.
154148

155-
+++ {"user_expressions": []}
156149

157150
```{exercise}
158151
:label: prob_ex1
159152
160-
Calculate the mean and variance directly from the PMF, using the expressions given above.
153+
Calculate the mean and variance for this parameterization (i.e., $n=10$)
154+
directly from the PMF, using the expressions given above.
161155
162-
Check that your answers agree with `u.mean()` and `u.var()`.
156+
Check that your answers agree with `u.mean()` and `u.var()`.
163157
```
164158

159+
160+
165161
#### Binomial distribution
166162

167163
Another useful (and more interesting) distribution is the **binomial distribution** on $S=\{0, \ldots, n\}$, which has PMF
@@ -205,7 +201,6 @@ ax.set_xticks(S)
205201
plt.show()
206202
```
207203

208-
+++ {"user_expressions": []}
209204

210205
Here's the CDF
211206

@@ -218,7 +213,6 @@ ax.set_xticks(S)
218213
plt.show()
219214
```
220215

221-
+++ {"user_expressions": []}
222216

223217
```{exercise}
224218
:label: prob_ex2
@@ -289,11 +283,9 @@ ax.set_xticks(S)
289283
plt.show()
290284
```
291285

292-
+++ {"user_expressions": []}
293286

294287
### Continuous distributions
295288

296-
+++ {"user_expressions": []}
297289

298290
Continuous distributions are represented by a **density function**, which is a function $p$ over $\mathbb R$ (the set of all numbers) such that $p(x) \geq 0$ for all $x$ and
299291

@@ -322,7 +314,6 @@ $$
322314
= \int_{-\infty}^x p(x) dx
323315
$$
324316

325-
+++ {"user_expressions": []}
326317

327318
#### Normal distribution
328319

@@ -366,7 +357,6 @@ plt.legend()
366357
plt.show()
367358
```
368359

369-
+++ {"user_expressions": []}
370360

371361
Here's a plot of the CDF:
372362

@@ -382,7 +372,6 @@ plt.legend()
382372
plt.show()
383373
```
384374

385-
+++ {"user_expressions": []}
386375

387376
#### Lognormal distribution
388377

@@ -495,22 +484,19 @@ plt.show()
495484

496485
#### Beta distribution
497486

498-
The **beta distribution** is a distribution on $\left(0, 1\right)$ with density
487+
The **beta distribution** is a distribution on $(0, 1)$ with density
499488

500489
$$
501490
p(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)}
502491
x^{\alpha - 1} (1 - x)^{\beta - 1}
503492
$$
504493

505-
where $\Gamma$ is the gamma function ($\Gamma(n) = (n - 1)!$ for $n \in \mathbb{N}$).
506-
507-
This distribution has two parameters, $\alpha$ and $\beta$.
508-
509-
It has a nice interpretation: if $X$ is beta distributed, then $X$ is the probability of success in a Bernoulli trial with a number of successes $\alpha$ and a number of failures $\beta$.
494+
where $\Gamma$ is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function).
510495

511-
For example, if $\alpha = \beta = 1$, then the beta distribution is uniform on $\left(0, 1\right)$ as the number of successes and failures are both 1.
496+
(The role of the gamma function is just to normalize the density, so that it
497+
integrates to one.)
512498

513-
While, if $\alpha = 3$ and $\beta = 2$, then the beta distribution is located more towards 1 as there are more successes than failures.
499+
This distribution has two parameters, $\alpha > 0$ and $\beta > 0$.
514500

515501
It can be shown that, for this distribution, the mean is $\alpha / (\alpha + \beta)$ and
516502
the variance is $\alpha \beta / (\alpha + \beta)^2 (\alpha + \beta + 1)$.
@@ -553,6 +539,7 @@ plt.legend()
553539
plt.show()
554540
```
555541

542+
556543
#### Gamma distribution
557544

558545
The **gamma distribution** is a distribution on $\left(0, \infty\right)$ with density
@@ -562,11 +549,14 @@ $$
562549
x^{\alpha - 1} \exp(-\beta x)
563550
$$
564551

565-
This distribution has two parameters, $\alpha$ and $\beta$.
552+
This distribution has two parameters, $\alpha > 0$ and $\beta > 0$.
566553

567-
It can be shown that, for this distribution, the mean is $\alpha / \beta$ and the variance is $\alpha / \beta^2$.
554+
It can be shown that, for this distribution, the mean is $\alpha / \beta$ and
555+
the variance is $\alpha / \beta^2$.
568556

569-
One interpretation is that if $X$ is gamma distributed, then $X$ is the sum of $\alpha$ independent exponentially distributed random variables with mean $1/\beta$.
557+
One interpretation is that if $X$ is gamma distributed and $\alpha$ is an
558+
integer, then $X$ is the sum of $\alpha$ independent exponentially distributed
559+
random variables with mean $1/\beta$.
570560

571561
We can obtain the moments, PDF, and CDF of the normal density as follows:
572562

@@ -608,7 +598,6 @@ plt.show()
608598

609599
## Observed distributions
610600

611-
+++ {"user_expressions": []}
612601

613602
Sometimes we refer to observed data or measurements as "distributions".
614603

@@ -630,17 +619,17 @@ df = pd.DataFrame(data, columns=['name', 'income'])
630619
df
631620
```
632621

633-
+++ {"user_expressions": []}
634622

635623
In this situation, we might refer to the set of their incomes as the "income distribution."
636624

637-
The terminology is confusing because this is not the same thing as a probability distribution --- it's just a collection of numbers.
625+
The terminology is confusing because this set is not a probability distribution
626+
--- it's just a collection of numbers.
638627

639-
Below we explore some observed distributions.
628+
However, as we will see, there are connections between observed distributions (i.e., sets of
629+
numbers like the income distribution above) and probability distributions.
640630

641-
We will see that there are connections between observed distributions---like the income distribution above---and probability distributions, as we'll see below.
631+
Below we explore some observed distributions.
642632

643-
+++ {"user_expressions": []}
644633

645634
### Summary statistics
646635

@@ -658,8 +647,6 @@ $$
658647
\frac{1}{n} \sum_{i=1}^n (x_i - \bar x)^2
659648
$$
660649

661-
+++ {"user_expressions": []}
662-
663650
For the income distribution given above, we can calculate these numbers via
664651

665652
```{code-cell} ipython3
@@ -670,7 +657,6 @@ x = np.asarray(df['income'])
670657
x.mean(), x.var()
671658
```
672659

673-
+++ {"user_expressions": []}
674660

675661
```{exercise}
676662
:label: prob_ex3

0 commit comments

Comments
 (0)