You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/prob_dist.md
+22-36Lines changed: 22 additions & 36 deletions
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,6 @@ import scipy.stats
39
39
import seaborn as sns
40
40
```
41
41
42
-
+++ {"user_expressions": []}
43
42
44
43
## Common distributions
45
44
@@ -100,7 +99,6 @@ n = 10
100
99
u = scipy.stats.randint(1, n+1)
101
100
```
102
101
103
-
+++ {"user_expressions": []}
104
102
105
103
Here's the mean and variance
106
104
@@ -110,7 +108,6 @@ u.mean(), u.var()
110
108
111
109
The formula for the mean is $(n+1)/2$, and the formula for the variance is $(n^2 - 1)/12$.
112
110
113
-
+++ {"user_expressions": []}
114
111
115
112
Now let's evaluate the PMF
116
113
@@ -122,7 +119,6 @@ u.pmf(1)
122
119
u.pmf(2)
123
120
```
124
121
125
-
+++ {"user_expressions": []}
126
122
127
123
Here's a plot of the probability mass function:
128
124
@@ -135,7 +131,6 @@ ax.set_xticks(S)
135
131
plt.show()
136
132
```
137
133
138
-
+++ {"user_expressions": []}
139
134
140
135
Here's a plot of the CDF:
141
136
@@ -148,20 +143,21 @@ ax.set_xticks(S)
148
143
plt.show()
149
144
```
150
145
151
-
+++ {"user_expressions": []}
152
146
153
147
The CDF jumps up by $p(x_i)$ and $x_i$.
154
148
155
-
+++ {"user_expressions": []}
156
149
157
150
```{exercise}
158
151
:label: prob_ex1
159
152
160
-
Calculate the mean and variance directly from the PMF, using the expressions given above.
153
+
Calculate the mean and variance for this parameterization (i.e., $n=10$)
154
+
directly from the PMF, using the expressions given above.
161
155
162
-
Check that your answers agree with `u.mean()` and `u.var()`.
156
+
Check that your answers agree with `u.mean()` and `u.var()`.
163
157
```
164
158
159
+
160
+
165
161
#### Binomial distribution
166
162
167
163
Another useful (and more interesting) distribution is the **binomial distribution** on $S=\{0, \ldots, n\}$, which has PMF
@@ -205,7 +201,6 @@ ax.set_xticks(S)
205
201
plt.show()
206
202
```
207
203
208
-
+++ {"user_expressions": []}
209
204
210
205
Here's the CDF
211
206
@@ -218,7 +213,6 @@ ax.set_xticks(S)
218
213
plt.show()
219
214
```
220
215
221
-
+++ {"user_expressions": []}
222
216
223
217
```{exercise}
224
218
:label: prob_ex2
@@ -289,11 +283,9 @@ ax.set_xticks(S)
289
283
plt.show()
290
284
```
291
285
292
-
+++ {"user_expressions": []}
293
286
294
287
### Continuous distributions
295
288
296
-
+++ {"user_expressions": []}
297
289
298
290
Continuous distributions are represented by a **density function**, which is a function $p$ over $\mathbb R$ (the set of all numbers) such that $p(x) \geq 0$ for all $x$ and
299
291
@@ -322,7 +314,6 @@ $$
322
314
= \int_{-\infty}^x p(x) dx
323
315
$$
324
316
325
-
+++ {"user_expressions": []}
326
317
327
318
#### Normal distribution
328
319
@@ -366,7 +357,6 @@ plt.legend()
366
357
plt.show()
367
358
```
368
359
369
-
+++ {"user_expressions": []}
370
360
371
361
Here's a plot of the CDF:
372
362
@@ -382,7 +372,6 @@ plt.legend()
382
372
plt.show()
383
373
```
384
374
385
-
+++ {"user_expressions": []}
386
375
387
376
#### Lognormal distribution
388
377
@@ -495,22 +484,19 @@ plt.show()
495
484
496
485
#### Beta distribution
497
486
498
-
The **beta distribution** is a distribution on $\left(0, 1\right)$ with density
487
+
The **beta distribution** is a distribution on $(0, 1)$ with density
where $\Gamma$ is the gamma function ($\Gamma(n) = (n - 1)!$ for $n \in \mathbb{N}$).
506
-
507
-
This distribution has two parameters, $\alpha$ and $\beta$.
508
-
509
-
It has a nice interpretation: if $X$ is beta distributed, then $X$ is the probability of success in a Bernoulli trial with a number of successes $\alpha$ and a number of failures $\beta$.
494
+
where $\Gamma$ is the [gamma function](https://en.wikipedia.org/wiki/Gamma_function).
510
495
511
-
For example, if $\alpha = \beta = 1$, then the beta distribution is uniform on $\left(0, 1\right)$ as the number of successes and failures are both 1.
496
+
(The role of the gamma function is just to normalize the density, so that it
497
+
integrates to one.)
512
498
513
-
While, if $\alpha = 3$ and $\beta = 2$, then the beta distribution is located more towards 1 as there are more successes than failures.
499
+
This distribution has two parameters, $\alpha > 0$ and $\beta > 0$.
514
500
515
501
It can be shown that, for this distribution, the mean is $\alpha / (\alpha + \beta)$ and
516
502
the variance is $\alpha \beta / (\alpha + \beta)^2 (\alpha + \beta + 1)$.
@@ -553,6 +539,7 @@ plt.legend()
553
539
plt.show()
554
540
```
555
541
542
+
556
543
#### Gamma distribution
557
544
558
545
The **gamma distribution** is a distribution on $\left(0, \infty\right)$ with density
@@ -562,11 +549,14 @@ $$
562
549
x^{\alpha - 1} \exp(-\beta x)
563
550
$$
564
551
565
-
This distribution has two parameters, $\alpha$ and $\beta$.
552
+
This distribution has two parameters, $\alpha > 0$ and $\beta > 0$.
566
553
567
-
It can be shown that, for this distribution, the mean is $\alpha / \beta$ and the variance is $\alpha / \beta^2$.
554
+
It can be shown that, for this distribution, the mean is $\alpha / \beta$ and
555
+
the variance is $\alpha / \beta^2$.
568
556
569
-
One interpretation is that if $X$ is gamma distributed, then $X$ is the sum of $\alpha$ independent exponentially distributed random variables with mean $1/\beta$.
557
+
One interpretation is that if $X$ is gamma distributed and $\alpha$ is an
558
+
integer, then $X$ is the sum of $\alpha$ independent exponentially distributed
559
+
random variables with mean $1/\beta$.
570
560
571
561
We can obtain the moments, PDF, and CDF of the normal density as follows:
572
562
@@ -608,7 +598,6 @@ plt.show()
608
598
609
599
## Observed distributions
610
600
611
-
+++ {"user_expressions": []}
612
601
613
602
Sometimes we refer to observed data or measurements as "distributions".
In this situation, we might refer to the set of their incomes as the "income distribution."
636
624
637
-
The terminology is confusing because this is not the same thing as a probability distribution --- it's just a collection of numbers.
625
+
The terminology is confusing because this set is not a probability distribution
626
+
--- it's just a collection of numbers.
638
627
639
-
Below we explore some observed distributions.
628
+
However, as we will see, there are connections between observed distributions (i.e., sets of
629
+
numbers like the income distribution above) and probability distributions.
640
630
641
-
We will see that there are connections between observed distributions---like the income distribution above---and probability distributions, as we'll see below.
631
+
Below we explore some observed distributions.
642
632
643
-
+++ {"user_expressions": []}
644
633
645
634
### Summary statistics
646
635
@@ -658,8 +647,6 @@ $$
658
647
\frac{1}{n} \sum_{i=1}^n (x_i - \bar x)^2
659
648
$$
660
649
661
-
+++ {"user_expressions": []}
662
-
663
650
For the income distribution given above, we can calculate these numbers via
0 commit comments