|
| 1 | +# Convolutions on Images |
| 2 | + |
| 3 | +For this section, we will no longer be focusing on signals, but instead images (arrays filled with elements of red, green, and blue values). |
| 4 | +That said, for the code examples, greyscale images may be used such that each array element is composed of some floating-point value instead of color. |
| 5 | +In addition, we will not be discussing boundary conditions too much in this chapter and will instead be using the simple boundaries introduced in the section on [one-dimensional convolutions](../1d/1d.md). |
| 6 | + |
| 7 | +The extension of one-dimensional convolutions to two dimensions requires a little thought about indexing and the like, but is ultimately the same operation. |
| 8 | +Here is an animation of a convolution for a two-dimensional image: |
| 9 | + |
| 10 | +<div style="text-align:center"> |
| 11 | +<video style="width:90%" controls loop> |
| 12 | + <source src="../res/2d.mp4" type="video/mp4"> |
| 13 | +Your browser does not support the video tag. |
| 14 | +</video> |
| 15 | +</div> |
| 16 | + |
| 17 | +In this case, we convolved the image with a 3x3 square filter, all filled with values of $$\frac{1}{9}$$. |
| 18 | +This created a simple blurring effect, which is somewhat expected from the discussion in the previous section. |
| 19 | +In code, a two-dimensional convolution might look like this: |
| 20 | + |
| 21 | +{% method %} |
| 22 | +{% sample lang="jl" %} |
| 23 | +[import:4-28, lang:"julia"](../code/julia/2d_convolution.jl) |
| 24 | +{% endmethod %} |
| 25 | + |
| 26 | +This is very similar to what we have shown in previous sections; however, it essentially requires four iterable dimensions because we need to iterate through each axis of the output domain *and* the filter. |
| 27 | + |
| 28 | +At this stage, it is worth highlighting common filters used for convolutions of images. |
| 29 | +In particular, we will further discuss the Gaussian filter introduced in the [previous section](../1d/1d.md), and then introduce another set of kernels known as Sobel operators, which are used for naïve edge detection or image derivatives. |
| 30 | + |
| 31 | +## The Gaussian kernel |
| 32 | + |
| 33 | +The Gaussian kernel serves as an effective *blurring* operation for images. |
| 34 | +As a reminder, the formula for any Gaussian distribution is |
| 35 | + |
| 36 | +$$ |
| 37 | +g(x,y) = \frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}, |
| 38 | +$$ |
| 39 | + |
| 40 | +where $$\sigma$$ is the standard deviation and is a measure of the width of the Gaussian. |
| 41 | +A larger $$\sigma$$ means a larger Gaussian; however, remember that the Gaussian must fit onto the filter, otherwise it will be cut off! |
| 42 | +For example, if you are using a $$3\times 3$$ filter, you should not be using $$\sigma = 10$$. |
| 43 | +Some definitions of $$\sigma$$ allow users to have a separate deviation in $$x$$ and $$y$$ to create an ellipsoid Gaussian, but for the purposes of this chapter, we will assume $$\sigma_x = \sigma_y$$. |
| 44 | +As a general rule of thumb, the larger the filter and standard deviation, the more "smeared" the final convolution will be. |
| 45 | + |
| 46 | +At this stage, it is important to write some code, so we will generate a simple function that returns a Gaussian kernel with a specified standard deviation and filter size. |
| 47 | + |
| 48 | +{% method %} |
| 49 | +{% sample lang="jl" %} |
| 50 | +[import:30-47, lang:"julia"](../code/julia/2d_convolution.jl) |
| 51 | +{% endmethod %} |
| 52 | + |
| 53 | +Though it is entirely possible to create a Gaussian kernel whose standard deviation is independent on the kernel size, we have decided to enforce a relation between the two in this chapter. |
| 54 | +As always, we encourage you to play with the code and create your own Gaussian kernels any way you want! |
| 55 | +As a note, all the kernels will be scaled (normalized) at the end by the sum of all internal elements. |
| 56 | +This ensures that the output of the convolution will not have an obnoxious scale factor associated with it. |
| 57 | + |
| 58 | +Below are a few images generated by applying a kernel generated with the code above to a black and white image of a circle. |
| 59 | + |
| 60 | +<p> |
| 61 | + <img class="center" src="../res/circle_blur.png" style="width:100%" /> |
| 62 | +</p> |
| 63 | + |
| 64 | + |
| 65 | +In (a), we show the original image, which is just a white circle at the center of a $$50\times 50$$ grid. |
| 66 | +In (b), we show the image after convolution with a $$3\times 3$$ kernel. |
| 67 | +In (c), we show the image after convolution with a $$20\times 20$$ kernel. |
| 68 | +Here, we see that (c) is significantly fuzzier than (b), which is a direct consequence of the kernel size. |
| 69 | + |
| 70 | +There is a lot more that we could talk about, but now is a good time to move on to a slightly more complicated convolutional method: the Sobel operator. |
| 71 | + |
| 72 | +## The Sobel operator |
| 73 | + |
| 74 | +The Sobel operator effectively performs a gradient operation on an image by highlighting areas where a large change has been made. |
| 75 | +In essence, this means that this operation can be thought of as a naïve edge detector. |
| 76 | +Essentially, the $$n$$-dimensional Sobel operator is composed of $$n$$ separate gradient convolutions (one for each dimension) that are then combined together into a final output array. |
| 77 | +Again, for the purposes of this chapter, we will stick to two dimensions, which will be composed of two separate gradients along the $$x$$ and $$y$$ directions. |
| 78 | +Each gradient will be created by convolving our image with their corresponding Sobel operator: |
| 79 | + |
| 80 | +$$ |
| 81 | +\begin{align} |
| 82 | +S_x &= \left(\begin{bmatrix} |
| 83 | +1 \\ |
| 84 | +2 \\ |
| 85 | +1 \\ |
| 86 | +\end{bmatrix} \otimes [1~0~-1] |
| 87 | +\right) = \begin{bmatrix} |
| 88 | +1 & 0 & -1 \\ |
| 89 | +2 & 0 & -2 \\ |
| 90 | +1 & 0 & -1 \\ |
| 91 | +\end{bmatrix}\\ |
| 92 | +
|
| 93 | +S_y &= \left( |
| 94 | +\begin{bmatrix} |
| 95 | +1 \\ |
| 96 | +0 \\ |
| 97 | +-1 \\ |
| 98 | +\end{bmatrix} \otimes [1~2~1] |
| 99 | +\right) = \begin{bmatrix} |
| 100 | +1 & 2 & 1 \\ |
| 101 | +0 & 0 & 0 \\ |
| 102 | +-1 & -2 & -1 \\ |
| 103 | +\end{bmatrix}. |
| 104 | +\end{align} |
| 105 | +$$ |
| 106 | + |
| 107 | +The gradients can then be found with a convolution, such that: |
| 108 | + |
| 109 | +$$ |
| 110 | +\begin{align} |
| 111 | +G_x &= S_x*A \\ |
| 112 | +G_y &= S_y*A. |
| 113 | +\end{align} |
| 114 | +$$ |
| 115 | + |
| 116 | +Here, $$A$$ is the input array or image. |
| 117 | +Finally, these gradients can be summed in quadrature to find the total Sobel operator or image gradient: |
| 118 | + |
| 119 | +$$ |
| 120 | +G_{\text{total}} = \sqrt{G_x^2 + G_y^2} |
| 121 | +$$ |
| 122 | + |
| 123 | +So let us now show what it does in practice: |
| 124 | + |
| 125 | +<p> |
| 126 | + <img class="center" src="../res/sobel_filters.png" style="width:100%" /> |
| 127 | +</p> |
| 128 | + |
| 129 | +In this diagram, we start with the circle image on the right, and then convolve it with the $$S_x$$ and $$S_y$$ operators to find the gradients along $$x$$ and $$y$$ before summing them in quadrature to get the final image gradient. |
| 130 | +Here, we see that the edges of our input image have been highlighted, showing outline of our circle. |
| 131 | +This is why the Sobel operator is also known as naïve edge detection and is an integral component to many more sophisticated edge detection methods like one proposed by Canny {{ "canny1986computational" | cite }}. |
| 132 | + |
| 133 | +In code, the Sobel operator involves first finding the operators in $$x$$ and $$y$$ and then applying them with a traditional convolution: |
| 134 | + |
| 135 | +{% method %} |
| 136 | +{% sample lang="jl" %} |
| 137 | +[import:49-63, lang:"julia"](../code/julia/2d_convolution.jl) |
| 138 | +{% endmethod %} |
| 139 | + |
| 140 | +With that, I believe we are at a good place to stop discussions on two-dimensional convolutions. |
| 141 | +We will definitely return to this topic in the future as new algorithms require more information. |
| 142 | + |
| 143 | +## Example Code |
| 144 | + |
| 145 | +For the code in this section, we have modified the visualizations from the [one-dimensional convolution chapter](../1d/1d.md) to add a two-dimensional variant for blurring an image of random white noise. |
| 146 | +We have also added code to create the Gaussian kernel and Sobel operator and apply it to the circle, as shown in the text. |
| 147 | + |
| 148 | +{% method %} |
| 149 | +{% sample lang="jl" %} |
| 150 | +[import, lang:"julia"](../code/julia/2d_convolution.jl) |
| 151 | +{% endmethod %} |
| 152 | + |
| 153 | +<script> |
| 154 | +MathJax.Hub.Queue(["Typeset",MathJax.Hub]); |
| 155 | +</script> |
| 156 | + |
| 157 | +### Bibliography |
| 158 | + |
| 159 | +{% references %} {% endreferences %} |
| 160 | + |
| 161 | +## License |
| 162 | + |
| 163 | +##### Code Examples |
| 164 | + |
| 165 | +The code examples are licensed under the MIT license (found in [LICENSE.md](https://github.com/algorithm-archivists/algorithm-archive/blob/master/LICENSE.md)). |
| 166 | + |
| 167 | +##### Images/Graphics |
| 168 | +- The image "[8bit Heart](../res/heart_8bit.png)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). |
| 169 | +- The image "[Circle Blur](../res/circle_blur.png)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). |
| 170 | +- The image "[Sobel Filters](../res/sobel_filters.png)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). |
| 171 | +- The video "[2D Convolution](../res/2d.mp4)" was created by [James Schloss](https://github.com/leios) and [Grant Sanderson](https://github.com/3b1b) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). |
| 172 | + |
| 173 | +##### Text |
| 174 | + |
| 175 | +The text of this chapter was written by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). |
| 176 | + |
| 177 | +[<p><img class="center" src="../../cc/CC-BY-SA_icon.svg" /></p>](https://creativecommons.org/licenses/by-sa/4.0/) |
| 178 | + |
| 179 | +##### Pull Requests |
| 180 | + |
| 181 | +After initial licensing ([#560](https://github.com/algorithm-archivists/algorithm-archive/pull/560)), the following pull requests have modified the text or graphics of this chapter: |
| 182 | +- none |
| 183 | + |
0 commit comments