Skip to content

Simply extending Iterable would break on Java serialization #11192

Closed
@eed3si9n

Description

@eed3si9n

This is a generalization of scala/scala-xml#254 reported by @ashawley and analyzed by @xuwei-k

steps

To minimize scala.xml.XMLTestJVM.serializeAttribute failure Yoshida-san created a minimization using a custom collection that looks like this:

class MyCollection[B](val list: List[B]) extends scala.collection.Iterable[B] {
  override def iterator = list.iterator
  // protected[this] override def writeReplace(): AnyRef = this
}

I'm breaking up the test into the following steps:

  @Test
  def testMyCollection: Unit = {
    val list = List(1, 2, 3)
    val arr = serialize(new MyCollection(list))
    val obj2 = deserialize[MyCollection[Int]](arr)
    assert(obj2.list == list)
  }

  def serialize[A <: Serializable](obj: A): Array[Byte] = {
    val o = new ByteArrayOutputStream()
    val os = new ObjectOutputStream(o)
    os.writeObject(obj)
    o.toByteArray()
  }

  def deserialize[A <: Serializable](bytes: Array[Byte]): A = {
    val s = new ByteArrayInputStream(bytes)
    val is = new ObjectInputStream(s)
    is.readObject().asInstanceOf[A]
  }

problem

When I run this the error I get is as follows:

[error] Test issue.IssueTest.testMyCollection failed: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to issue.MyCollection, took 0.009 sec
[error]     at issue.IssueTest.testMyCollection(IssueTest.scala:20)
[error]     ...

In other words, I don't get to the assertion, but instead it's failing at casting in the deserialization which deserialized $colon$colon instead of MyCollection. This looks to be a different problem than #9237.

expectation

Either the serialization works out of the box, or MyCollection does not compile without providing some serialization mechanism.

workaround

A workaround identified by Yoshida-san is uncommenting the following:

// protected[this] override def writeReplace(): AnyRef = this

note

scala/scala#6676 makes Iterable Serializable by default.

trait Iterable[+A] extends IterableOnce[A] with IterableOps[A, Iterable, Iterable[A]] with Serializable {

with writeReplace implemented as follows:

  protected[this] def writeReplace(): AnyRef = new DefaultSerializationProxy(iterableFactory.iterableFactory, this)

In other words, the serialization of all things Iterable are passed into DefaultSerializationProxy, including all subtypes that exist in the wild. Perhaps it should fail at the point of serialization when it detects a type that it cannot handle.

Letting it serialize the data, but not deserialize sounds like a potentially data-losing behavior. Another thing to consider is forcing subclasses of Iterable to implement a serialization method. The situation where it's easy to roll your own collection, but it will blow up on Spark by default is not a happy experience.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions