|
| 1 | +--- |
| 2 | +layout: overview-large |
| 3 | +title: Overview |
| 4 | + |
| 5 | +disqus: true |
| 6 | + |
| 7 | +partof: parallel-collections |
| 8 | +num: 1 |
| 9 | +languages: [ja] |
| 10 | +--- |
| 11 | + |
| 12 | +**Autores originales: Aleksandar Prokopec, Heather Miller** |
| 13 | + |
| 14 | +**Traducción y arreglos: Santiago Basulto** |
| 15 | + |
| 16 | +## Motivación |
| 17 | + |
| 18 | +En el medio del cambio en los recientes años de los fabricantes de procesadores de arquitecturas simples a arquitecturas multi-nucleo, tanto el ámbito académico, como el industrial coinciden que la _Programación Paralela_ sigue siendo un gran desafío. |
| 19 | + |
| 20 | +Las Colecciones Paralelizadas (Parallel collections, en inglés) fueron incluidas en la librería del lenguaje Scala en un esfuerzo de facilitar la programación paralela al abstraer a los usuarios de detalles de paralelización de bajo nivel, mientras se provee con una abstracción de alto nivel, simple y familiar. La esperanza era, y sigue siendo, que el paralelismo implícito detrás de una abstracción de colecciones (como lo es el actual framework de colecciones del lenguaje) acercara la ejecución paralela confiable, un poco más al trabajo diario de los desarrolladores. |
| 21 | + |
| 22 | +La idea es simple: las colecciones son abstracciones de programación ficientemente entendidas y a su vez son frecuentemente usadas. Dada su regularidad, es posible que sean paralelizadas eficiente y transparentemente. Al permitirle al usuario intercambiar colecciones secuenciales por aquellas que son operadas en paralelo, las colecciones paralelizadas de Scala dan un gran paso hacia la posibilidad de que el paralelismo sea introducido cada vez más frecuentemente en nuestro código. |
| 23 | + |
| 24 | +Veamos el siguiente ejemplo secuencial, donde realizamos una operación monádica en una colección lo suficientemente grande. |
| 25 | + |
| 26 | + val list = (1 to 10000).toList |
| 27 | + list.map(_ + 42) |
| 28 | + |
| 29 | +Para realizar la misma operación en paralelo, lo único que devemos incluir, es la invocación al método `par` en la colección secuencial `list`. Después de eso, es posible utilizar la misma colección paralelizada de la misma manera que normalmente la usariamos si fuera una colección secuencial. El ejemplo superior puede ser paralelizado al hacer simplemente lo siguiente: |
| 30 | + |
| 31 | + list.par.map(_ + 42) |
| 32 | + |
| 33 | +El diseño de la librería de colecciones paralelizadas de Scala está inspirada y fuertemente integrada con la librería estandar de colecciones (secuenciales) del lenguaje (introducida en la versión 2.8). Se provee te una contraparte paralelizada a un número importante de estructuras de datos de la librería de colecciones (secuenciales) de Scala, incluyendo: |
| 34 | + |
| 35 | +* `ParArray` |
| 36 | +* `ParVector` |
| 37 | +* `mutable.ParHashMap` |
| 38 | +* `mutable.ParHashSet` |
| 39 | +* `immutable.ParHashMap` |
| 40 | +* `immutable.ParHashSet` |
| 41 | +* `ParRange` |
| 42 | +* `ParTrieMap` (`collection.concurrent.TrieMap`s are new in 2.10) |
| 43 | + |
| 44 | +Además de una arquitectura común, la librería de colecciones paralelizadas de Scala también comparte la _extensibilidad_ con la librería de colecciones secuenciales. Es decir, de la misma manera que los usuarios pueden integrar sus propios tipos de tipos de colecciones de la librería normal de colecciones secuenciales, pueden realizarlo con la librería de colecciones paralelizadas, heredando automáticamente todas las operaciones paralelas disponibles en las demás colecciones paralelizadas de la librería estandar. |
| 45 | + |
| 46 | +## Algunos Ejemplos |
| 47 | + |
| 48 | +To attempt to illustrate the generality and utility of parallel collections, |
| 49 | +we provide a handful of simple example usages, all of which are transparently |
| 50 | +executed in parallel. |
| 51 | + |
| 52 | +De forma de ilustrar la generalidad y utilidad de las colecciones paralelizadas, proveemos un conjunto de ejemplos de uso útiles, todos ellos siendo ejecutados en paralelo de forma totalmente transparente al usuario. |
| 53 | + |
| 54 | +_Nota:_ Algunos de los siguientes ejemplos operan en colecciones pequeñas, lo cual no es recomendado. Son provistos como ejemplo para ilustrar solamente el propósito. Como una regla heurística general, los incrementos en velocidad de ejecución comienzan a ser notados cuando el tamaño de la colección es lo suficientemente grande, tipicamente algunos cuantos miles de elementos. (Para más información en la relación entre tamaño de una coleccion paralelizada y su performance, por favor véase [appropriate subsection]({{ site.baseurl}}/overviews/parallel-collections/performance.html#how_big_should_a_collection_be_to_go_parallel) en la sección [performance]({{ site.baseurl }}/overviews/parallel-collections/performance.html) (en inglés). |
| 55 | + |
| 56 | +#### map |
| 57 | + |
| 58 | +Usando un `map` paralelizado para transformar una colección de elementos tipo `String` a todos caracteres en mayúscula: |
| 59 | + |
| 60 | + scala> val apellidos = List("Smith","Jones","Frankenstein","Bach","Jackson","Rodin").par |
| 61 | + apellidos: scala.collection.parallel.immutable.ParSeq[String] = ParVector(Smith, Jones, Frankenstein, Bach, Jackson, Rodin) |
| 62 | + |
| 63 | + scala> apellidos.map(_.toUpperCase) |
| 64 | + res0: scala.collection.parallel.immutable.ParSeq[String] = ParVector(SMITH, JONES, FRANKENSTEIN, BACH, JACKSON, RODIN) |
| 65 | + |
| 66 | +#### fold |
| 67 | + |
| 68 | +Sumatoria mediante `fold` en un `ParArray`: |
| 69 | + |
| 70 | + scala> val parArray = (1 to 1000000).toArray.par |
| 71 | + parArray: scala.collection.parallel.mutable.ParArray[Int] = ParArray(1, 2, 3, ... |
| 72 | + |
| 73 | + scala> parArray.fold(0)(_ + _) |
| 74 | + res0: Int = 1784293664 |
| 75 | + |
| 76 | +#### filtrando |
| 77 | + |
| 78 | + |
| 79 | +Usando un filtrado mediante `filter` paralelizado para seleccionar los apellidos que alfabéticamente preceden la letra "K": |
| 80 | + |
| 81 | + scala> val apellidos = List("Smith","Jones","Frankenstein","Bach","Jackson","Rodin").par |
| 82 | + apellidos: scala.collection.parallel.immutable.ParSeq[String] = ParVector(Smith, Jones, Frankenstein, Bach, Jackson, Rodin) |
| 83 | + |
| 84 | + scala> apellidos.filter(_.head >= 'J') |
| 85 | + res0: scala.collection.parallel.immutable.ParSeq[String] = ParVector(Smith, Jones, Jackson, Rodin) |
| 86 | + |
| 87 | +## Creación de colecciones paralelizadas |
| 88 | + |
| 89 | +Las colecciones paralelizadas están pensadas para ser usadas exactamente de la misma manera que las colecciones secuenciales --la única diferencia notoria es cómo _obtener_ una colección paralelizada. |
| 90 | + |
| 91 | +Generalmente se tienen dos opciones para la creación de colecciones paralelizadas: |
| 92 | + |
| 93 | +Primero al utilizar la palabra clave `new` y una sentencia de importación apropiada: |
| 94 | + |
| 95 | + import scala.collection.parallel.immutable.ParVector |
| 96 | + val pv = new ParVector[Int] |
| 97 | + |
| 98 | +Segundo, al _convertir_ desde una colección secuencial: |
| 99 | + |
| 100 | + val pv = Vector(1,2,3,4,5,6,7,8,9).par |
| 101 | + |
| 102 | +What's important to expand upon here are these conversion methods-- sequential |
| 103 | +collections can be converted to parallel collections by invoking the |
| 104 | +sequential collection's `par` method, and likewise, parallel collections can |
| 105 | +be converted to sequential collections by invoking the parallel collection's |
| 106 | +`seq` method. |
| 107 | + |
| 108 | +_Of Note:_ Collections that are inherently sequential (in the sense that the |
| 109 | +elements must be accessed one after the other), like lists, queues, and |
| 110 | +streams, are converted to their parallel counterparts by copying the elements |
| 111 | +into a similar parallel collection. An example is `List`-- it's converted into |
| 112 | +a standard immutable parallel sequence, which is a `ParVector`. Of course, the |
| 113 | +copying required for these collection types introduces an overhead not |
| 114 | +incurred by any other collection types, like `Array`, `Vector`, `HashMap`, etc. |
| 115 | + |
| 116 | +For more information on conversions on parallel collections, see the |
| 117 | +[conversions]({{ site.baseurl }}/overviews/parallel-collections/converesions.html) |
| 118 | +and [concrete parallel collection classes]({{ site.baseurl }}/overviews/parallel-collections/concrete-parallel-collections.html) |
| 119 | +sections of thise guide. |
| 120 | + |
| 121 | +## Semantics |
| 122 | + |
| 123 | +While the parallel collections abstraction feels very much the same as normal |
| 124 | +sequential collections, it's important to note that its semantics differs, |
| 125 | +especially with regards to side-effects and non-associative operations. |
| 126 | + |
| 127 | +In order to see how this is the case, first, we visualize _how_ operations are |
| 128 | +performed in parallel. Conceptually, Scala's parallel collections framework |
| 129 | +parallelizes an operation on a parallel collection by recursively "splitting" |
| 130 | +a given collection, applying an operation on each partition of the collection |
| 131 | +in parallel, and re-"combining" all of the results that were completed in |
| 132 | +parallel. |
| 133 | + |
| 134 | +These concurrent, and "out-of-order" semantics of parallel collections lead to |
| 135 | +the following two implications: |
| 136 | + |
| 137 | +1. **Side-effecting operations can lead to non-determinism** |
| 138 | +2. **Non-associative operations lead to non-determinism** |
| 139 | + |
| 140 | +### Side-Effecting Operations |
| 141 | + |
| 142 | +Given the _concurrent_ execution semantics of the parallel collections |
| 143 | +framework, operations performed on a collection which cause side-effects |
| 144 | +should generally be avoided, in order to maintain determinism. A simple |
| 145 | +example is by using an accessor method, like `foreach` to increment a `var` |
| 146 | +declared outside of the closure which is passed to `foreach`. |
| 147 | + |
| 148 | + scala> var sum = 0 |
| 149 | + sum: Int = 0 |
| 150 | + |
| 151 | + scala> val list = (1 to 1000).toList.par |
| 152 | + list: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3,… |
| 153 | + |
| 154 | + scala> list.foreach(sum += _); sum |
| 155 | + res01: Int = 467766 |
| 156 | + |
| 157 | + scala> var sum = 0 |
| 158 | + sum: Int = 0 |
| 159 | + |
| 160 | + scala> list.foreach(sum += _); sum |
| 161 | + res02: Int = 457073 |
| 162 | + |
| 163 | + scala> var sum = 0 |
| 164 | + sum: Int = 0 |
| 165 | + |
| 166 | + scala> list.foreach(sum += _); sum |
| 167 | + res03: Int = 468520 |
| 168 | + |
| 169 | +Here, we can see that each time `sum` is reinitialized to 0, and `foreach` is |
| 170 | +called again on `list`, `sum` holds a different value. The source of this |
| 171 | +non-determinism is a _data race_-- concurrent reads/writes to the same mutable |
| 172 | +variable. |
| 173 | + |
| 174 | +In the above example, it's possible for two threads to read the _same_ value |
| 175 | +in `sum`, to spend some time doing some operation on that value of `sum`, and |
| 176 | +then to attempt to write a new value to `sum`, potentially resulting in an |
| 177 | +overwrite (and thus, loss) of a valuable result, as illustrated below: |
| 178 | + |
| 179 | + ThreadA: read value in sum, sum = 0 value in sum: 0 |
| 180 | + ThreadB: read value in sum, sum = 0 value in sum: 0 |
| 181 | + ThreadA: increment sum by 760, write sum = 760 value in sum: 760 |
| 182 | + ThreadB: increment sum by 12, write sum = 12 value in sum: 12 |
| 183 | + |
| 184 | +The above example illustrates a scenario where two threads read the same |
| 185 | +value, `0`, before one or the other can sum `0` with an element from their |
| 186 | +partition of the parallel collection. In this case, `ThreadA` reads `0` and |
| 187 | +sums it with its element, `0+760`, and in the case of `ThreadB`, sums `0` with |
| 188 | +its element, `0+12`. After computing their respective sums, they each write |
| 189 | +their computed value in `sum`. Since `ThreadA` beats `ThreadB`, it writes |
| 190 | +first, only for the value in `sum` to be overwritten shortly after by |
| 191 | +`ThreadB`, in effect completely overwriting (and thus losing) the value `760`. |
| 192 | + |
| 193 | +### Non-Associative Operations |
| 194 | + |
| 195 | +Given this _"out-of-order"_ semantics, also must be careful to perform only |
| 196 | +associative operations in order to avoid non-determinism. That is, given a |
| 197 | +parallel collection, `pcoll`, one should be sure that when invoking a |
| 198 | +higher-order function on `pcoll`, such as `pcoll.reduce(func)`, the order in |
| 199 | +which `func` is applied to the elements of `pcoll` can be arbitrary. A simple, |
| 200 | +but obvious example is a non-associative operation such as subtraction: |
| 201 | + |
| 202 | + scala> val list = (1 to 1000).toList.par |
| 203 | + list: scala.collection.parallel.immutable.ParSeq[Int] = ParVector(1, 2, 3,… |
| 204 | + |
| 205 | + scala> list.reduce(_-_) |
| 206 | + res01: Int = -228888 |
| 207 | + |
| 208 | + scala> list.reduce(_-_) |
| 209 | + res02: Int = -61000 |
| 210 | + |
| 211 | + scala> list.reduce(_-_) |
| 212 | + res03: Int = -331818 |
| 213 | + |
| 214 | +In the above example, we take a `ParVector[Int]`, invoke `reduce`, and pass to |
| 215 | +it `_-_`, which simply takes two unnamed elements, and subtracts the first |
| 216 | +from the second. Due to the fact that the parallel collections framework spawns |
| 217 | +threads which, in effect, independently perform `reduce(_-_)` on different |
| 218 | +sections of the collection, the result of two runs of `reduce(_-_)` on the |
| 219 | +same collection will not be the same. |
| 220 | + |
| 221 | +_Note:_ Often, it is thought that, like non-associative operations, non-commutative |
| 222 | +operations passed to a higher-order function on a parallel |
| 223 | +collection likewise result in non-deterministic behavior. This is not the |
| 224 | +case, a simple example is string concatenation-- an associative, but non- |
| 225 | +commutative operation: |
| 226 | + |
| 227 | + scala> val strings = List("abc","def","ghi","jk","lmnop","qrs","tuv","wx","yz").par |
| 228 | + strings: scala.collection.parallel.immutable.ParSeq[java.lang.String] = ParVector(abc, def, ghi, jk, lmnop, qrs, tuv, wx, yz) |
| 229 | + |
| 230 | + scala> val alphabet = strings.reduce(_++_) |
| 231 | + alphabet: java.lang.String = abcdefghijklmnopqrstuvwxyz |
| 232 | + |
| 233 | +The _"out of order"_ semantics of parallel collections only means that |
| 234 | +the operation will be executed out of order (in a _temporal_ sense. That is, |
| 235 | +non-sequentially), it does not mean that the result will be |
| 236 | +re-"*combined*" out of order (in a _spatial_ sense). On the contrary, results |
| 237 | +will generally always be reassembled _in order_-- that is, a parallel collection |
| 238 | +broken into partitions A, B, C, in that order, will be reassembled once again |
| 239 | +in the order A, B, C. Not some other arbitrary order like B, C, A. |
| 240 | + |
| 241 | +For more on how parallel collections split and combine operations on different |
| 242 | +parallel collection types, see the [Architecture]({{ site.baseurl }}/overviews |
| 243 | +/parallel-collections/architecture.html) section of this guide. |
| 244 | + |
0 commit comments