|
| 1 | +# What's a probability distribution? |
| 2 | + |
| 3 | +Probability distributions are mathematical functions that give the probabilities of a range or set of outcomes. |
| 4 | +These outcomes can be the result of an experiment or procedure, such as tossing a coin or rolling dice. |
| 5 | +They can also be the result of a physical measurement, such as measuring the temperature of an object, counting how many electrons are spin up, etc. |
| 6 | +Broadly speaking, we can classify probability distributions into two categories - __discrete probability distributions__ and __continuous probability distributions__. |
| 7 | + |
| 8 | +## Discrete Probability Distributions |
| 9 | + |
| 10 | +It's intuitive for us to understand what a __discrete__ probability distribution is. |
| 11 | +For example, we understand the outcomes of a coin toss very well, and also that of a dice roll. |
| 12 | +For a single coin toss, we know that the probability of getting heads $$(H)$$ is half, or $$P(H) = \frac{1}{2}$$. |
| 13 | +Similarly, the probability of getting tails $$(T)$$ is $$P(T) = \frac{1}{2}$$. |
| 14 | +Formally, we can write the probability distribution for such a coin toss as, |
| 15 | + |
| 16 | +$$ |
| 17 | +P(n) = \begin{matrix} |
| 18 | + \displaystyle \frac 1 2 &;& n \in \left\{H,T\right\}. |
| 19 | + \end{matrix} |
| 20 | +$$ |
| 21 | + |
| 22 | +Here, $$n$$ denotes the outcome, and we used the "set notation", $$n \in\left\{H,T\right\}$$, which means "$$n$$ belongs to a set containing $$H$$ and $$T$$". |
| 23 | +From the above equation, we can also assume that any other outcome for $$n$$ (such as landing on an edge) is incredibly unlikely, impossible, or simply "not allowed" (for example, just toss again if it _does_ land on its edge!). |
| 24 | + |
| 25 | +For a probability distribution, it's important to take note of the set of possibilities, or the __domain__ of the distribution. |
| 26 | +Here, $$\left\{H,T\right\}$$ is the domain of $$P(n)$$, telling us that $$n$$ can only be either $$H$$ or $$T$$. |
| 27 | + |
| 28 | +If we use a different system, the outcome $$n$$ could mean other things. |
| 29 | +For example, it could be a number like the outcome of a __die roll__ which has the probability distribution, |
| 30 | + |
| 31 | + |
| 32 | +$$ |
| 33 | +P(n) = \begin{matrix} |
| 34 | + \displaystyle\frac 1 6 &;& n \in [\![1,6]\!] |
| 35 | + \end{matrix}. |
| 36 | +$$ |
| 37 | +This is saying that the probability of $$n$$ being a whole number between $$1$$ and $$6$$ is $$1/6$$, and we assume that the probability of getting any other $$n$$ is $$0$$. |
| 38 | +This is a discrete probability function because $$n$$ is an integer, and thus only takes discrete values. |
| 39 | + |
| 40 | +Both of the above examples are rather boring, because the value of $$P(n)$$ is the same for all $$n$$. |
| 41 | +An example of a discrete probability function where the probability actually depends on $$n$$, is when $$n$$ is the sum of numbers on a __roll of two dice__. |
| 42 | +In this case, $$P(n)$$ is different for each $$n$$ as some possibilities like $$n=2$$ can happen in only one possible way (by getting a $$1$$ on both dice), whereas $$n=4$$ can happen in $$3$$ ways ($$1$$ and $$3$$; or $$2$$ and $$2$$; or $$3$$ and $$1$$). |
| 43 | + |
| 44 | +The example of rolling two dice is a great case study for how we can construct a probability distribution, since the probability varies and it is not immediately obvious how it varies. |
| 45 | +So let's go ahead and construct it! |
| 46 | + |
| 47 | +Let's first define the domain of our target $$P(n)$$. |
| 48 | +We know that the lowest sum of two dice is $$2$$ (a $$1$$ on both dice), so $$n \geq 2$$ for sure. Similarly, the maximum is the sum of two sixes, or $$12$$, so $$n \leq 12$$ also. |
| 49 | + |
| 50 | +So now we know the domain of possibilities, i.e., $$n \in [\![2,12]\!]$$. |
| 51 | +Next, we take a very common approach - for each outcome $$n$$, we count up the number of different ways it can occur. |
| 52 | +Let's call this number the __frequency of__ $$n$$, $$f(n)$$. |
| 53 | +We already mentioned that there is only one way to get $$n=2$$, by getting a pair of $$1$$s. |
| 54 | +By our definition of the function $$f$$, this means that $$f(2)=1$$. |
| 55 | +For $$n=3$$, we see that there are two possible ways of getting this outcome: the first die shows a $$1$$ and the second a $$2$$, or the first die shows a $$2$$ and the second a $$1$$. |
| 56 | +Thus, $$f(3)=2$$. |
| 57 | +If you continue doing this for all $$n$$, you may see a pattern (homework for the reader!). |
| 58 | +Once you have all the $$f(n)$$, we can visualize it by plotting $$f(n)$$ vs $$n$$, as shown below. |
| 59 | + |
| 60 | +<p> |
| 61 | + <img class="center" src="res/double_die_frequencies.png" alt="<FIG> Die Roll" style="width:80%"/> |
| 62 | +</p> |
| 63 | + |
| 64 | +We can see from the plot that the most common outcome for the sum of two dice is a $$n=7$$, and the further away from $$n=7$$ you get, the less likely the outcome. |
| 65 | +Good to know, for a prospective gambler! |
| 66 | + |
| 67 | +### Normalization |
| 68 | + |
| 69 | +The $$f(n)$$ plotted above is technically NOT the probability $$P(n)$$ – because we know that the sum of all probabilities should be $$1$$, which clearly isn't the case for $$f(n)$$. |
| 70 | +But we can get the probability by dividing $$f(n)$$ by the _total_ number of possibilities, $$N$$. |
| 71 | +For two dice, that is $$N = 6 \times 6 = 36$$, but we could also express it as the _sum of all frequencies_, |
| 72 | + |
| 73 | +$$ |
| 74 | +N = \sum_n f(n), |
| 75 | +$$ |
| 76 | + |
| 77 | +which would also equal to $$36$$ in this case. |
| 78 | +So, by dividing $$f(n)$$ by $$\sum_n f(n)$$ we get our target probability distribution, $$P(n)$$. |
| 79 | +This process is called __normalization__ and is crucial for determining almost any probability distribution. |
| 80 | +So in general, if we have the function $$f(n)$$, we can get the probability as |
| 81 | + |
| 82 | +$$ |
| 83 | +P(n) = \frac{f(n)}{\displaystyle\sum_{n} f(n)}. |
| 84 | +$$ |
| 85 | + |
| 86 | +Note that $$f(n)$$ does not necessarily have to be the frequency of $$n$$ – it could be any function which is _proportional_ to $$P(n)$$, and the above definition of $$P(n)$$ would still hold. |
| 87 | +It's easy to check that the sum is now equal to $$1$$, since |
| 88 | + |
| 89 | +$$ |
| 90 | +\sum_n P(n) = \frac{\displaystyle\sum_{n}f(n)}{\displaystyle\sum_{n} f(n)} = 1. |
| 91 | +$$ |
| 92 | + |
| 93 | +Once we have the probability function $$P(n)$$, we can calculate all sorts of probabilites. |
| 94 | +For example, let's say we want to find the probability that $$n$$ will be between two integers $$a$$ and $$b$$, inclusively (also including $$a$$ and $$b$$). |
| 95 | +For brevity, we will use the notation $$\mathbb{P}(a \leq n \leq b)$$ to denote this probability. |
| 96 | +And to calculate it, we simply have to sum up all the probabilities for each value of $$n$$ in that range, i.e., |
| 97 | + |
| 98 | +$$ |
| 99 | +\mathbb{P}(a \leq n \leq b) = \sum_{n=a}^{b} P(n). |
| 100 | +$$ |
| 101 | + |
| 102 | +## Probability Density Functions |
| 103 | + |
| 104 | +What if instead of a discrete variable $$n$$, we had a continuous variable $$x$$, like temperature or weight? |
| 105 | +In that case, it doesn't make sense to ask what the probability is of $$x$$ being _exactly_ a particular number – there are infinite possible real numbers, after all, so the probability of $$x$$ being exactly any one of them is essentially zero! |
| 106 | +But it _does_ make sense to ask what the probability is that $$x$$ will be _between_ a certain range of values. |
| 107 | +For example, one might say that there is $$50\%$$ chance that the temperature tomorrow at noon will be between $$5$$ and $$15$$, or $$5\%$$ chance that it will be between $$16$$ and $$16.5$$. |
| 108 | +But how do we put all that information, for every possible range, in a single function? |
| 109 | +The answer is to use a __probability density function__. |
| 110 | + |
| 111 | + What does that mean? |
| 112 | +Well, suppose $$x$$ is a continous quantity, and we have a probability density function, $$P(x)$$ which looks like |
| 113 | + |
| 114 | +<p> |
| 115 | + <img class="center" src="res/normal_distribution.png" alt="<FIG> probability density" style="width:100%"/> |
| 116 | +</p> |
| 117 | + |
| 118 | +Now, if we are interested in the probability of the range of values that lie between $$x_0$$ and $$x_0 + dx$$, all we have to do is calculate the _area_ of the green sliver above. |
| 119 | +This is the defining feature of a probability density function: |
| 120 | + |
| 121 | + __the probability of a range of values is the _area_ of the region under the probability density curve which is within that range.__ |
| 122 | + |
| 123 | + |
| 124 | +So if $$dx$$ is infinitesimally small, then the area of the green sliver becomes $$P(x)dx$$, and hence, |
| 125 | + |
| 126 | +$$ |
| 127 | +\mathbb{P}(x_0 \leq x \leq x_0 + dx) = P(x)dx. |
| 128 | +$$ |
| 129 | + |
| 130 | +So strictly speaking, $$P(x)$$ itself is NOT a probability, but rather the probability is the quantity $$P(x)dx$$, or any area under the curve. |
| 131 | +That is why we call $$P(x)$$ the probability _density_ at $$x$$, while the actual probability is only defined for ranges of $$x$$. |
| 132 | + |
| 133 | +Thus, to obtain the probability of $$x$$ lying within a range, we simply integrate $$P(x)$$ between that range, i.e., |
| 134 | + |
| 135 | +$$ |
| 136 | +\mathbb{P}(a \leq x \leq b ) = \int_a^b P(x)dx. |
| 137 | +$$ |
| 138 | + |
| 139 | +This is analagous to finding the probability of a range of discrete values from the previous section: |
| 140 | + |
| 141 | +$$ |
| 142 | +\mathbb{P}(a \leq n \leq b) = \sum_{n=a}^{b} P(n). |
| 143 | +$$ |
| 144 | + |
| 145 | +The fact that all probabilities must sum to $$1$$ translates to |
| 146 | + |
| 147 | +$$ |
| 148 | +\int_D P(x)dx = 1. |
| 149 | +$$ |
| 150 | + |
| 151 | +where $$D$$ denotes the __domain__ of $$P(x)$$, i.e., the entire range of possible values of $$x$$ for which $$P(x)$$ is defined. |
| 152 | + |
| 153 | +### Normalization of a Density Function |
| 154 | + |
| 155 | +Just like in the discrete case, we often first calculate some density or frequency function $$f(x)$$, which is NOT $$P(x)$$, but proportional to it. |
| 156 | +We can get the probability density function by normalizing it in a similar way, except that we integrate instead of sum: |
| 157 | + |
| 158 | +$$ |
| 159 | +P(\mathbf{x}) = \frac{f(\mathbf{x})}{\int_D f(\mathbf{x})d\mathbf{x}}. |
| 160 | +$$ |
| 161 | + |
| 162 | +For example, consider the following __Gaussian function__ (popularly used in __normal distributions__), |
| 163 | + |
| 164 | +$$ |
| 165 | +f(x) = e^{-x^2}, |
| 166 | +$$ |
| 167 | + |
| 168 | +which is defined for all real numbers $$x$$. |
| 169 | +We first integrate it (or do a quick google search, as it is rather tricky) to get |
| 170 | + |
| 171 | +$$ |
| 172 | +N = \int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}. |
| 173 | +$$ |
| 174 | + |
| 175 | +Now we have a Gaussian probability distribution, |
| 176 | + |
| 177 | +$$ |
| 178 | +P(x) = \frac{1}{N} e^{-x^2} = \frac{1}{\sqrt{\pi}} e^{-x^2}. |
| 179 | +$$ |
| 180 | + |
| 181 | +In general, normalization can allow us to create a probability distribution out of almost any function $$f(\mathbf{x})$$. |
| 182 | +There are really only two rules that $$f(\mathbf{x})$$ must satisfy to be a candidate for a probability density distribution: |
| 183 | +1. The integral of $$f(\mathbf{x})$$ over any subset of $$D$$ (denoted by $$S$$) has to be non-negative (it can be zero): |
| 184 | +$$ |
| 185 | +\int_{S}f(\mathbf{x})d\mathbf{x} \geq 0. |
| 186 | +$$ |
| 187 | +2. The following integral must be finite: |
| 188 | +$$ |
| 189 | +\int_{D} f(\mathbf{x})d\mathbf{x}. |
| 190 | +$$ |
| 191 | + |
| 192 | +<script> |
| 193 | +MathJax.Hub.Queue(["Typeset",MathJax.Hub]); |
| 194 | +</script> |
| 195 | + |
| 196 | +## License |
| 197 | + |
| 198 | +##### Images/Graphics |
| 199 | + |
| 200 | +- The image "[Frequency distribution of a double die roll](res/double_die_frequencies.png)" was created by [K. Shudipto Amin](https://github.com/shudipto-amin) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). |
| 201 | + |
| 202 | +- The image "[Probability Density](res/normal_distribution.png)" was created by [K. Shudipto Amin](https://github.com/shudipto-amin) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). |
| 203 | + |
| 204 | +##### Text |
| 205 | + |
| 206 | +The text of this chapter was written by [K. Shudipto Amin](https://github.com/shudipto-amin) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). |
| 207 | + |
| 208 | +[<p><img class="center" src="../cc/CC-BY-SA_icon.svg" /></p>](https://creativecommons.org/licenses/by-sa/4.0/) |
| 209 | + |
| 210 | +##### Pull Requests |
| 211 | + |
| 212 | +After initial licensing ([#560](https://github.com/algorithm-archivists/algorithm-archive/pull/560)), the following pull requests have modified the text or graphics of this chapter: |
| 213 | +- none |
| 214 | + |
0 commit comments