Description
This is a generalization of scala/scala-xml#254 reported by @ashawley and analyzed by @xuwei-k
steps
To minimize scala.xml.XMLTestJVM.serializeAttribute
failure Yoshida-san created a minimization using a custom collection that looks like this:
class MyCollection[B](val list: List[B]) extends scala.collection.Iterable[B] {
override def iterator = list.iterator
// protected[this] override def writeReplace(): AnyRef = this
}
I'm breaking up the test into the following steps:
@Test
def testMyCollection: Unit = {
val list = List(1, 2, 3)
val arr = serialize(new MyCollection(list))
val obj2 = deserialize[MyCollection[Int]](arr)
assert(obj2.list == list)
}
def serialize[A <: Serializable](obj: A): Array[Byte] = {
val o = new ByteArrayOutputStream()
val os = new ObjectOutputStream(o)
os.writeObject(obj)
o.toByteArray()
}
def deserialize[A <: Serializable](bytes: Array[Byte]): A = {
val s = new ByteArrayInputStream(bytes)
val is = new ObjectInputStream(s)
is.readObject().asInstanceOf[A]
}
problem
When I run this the error I get is as follows:
[error] Test issue.IssueTest.testMyCollection failed: java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to issue.MyCollection, took 0.009 sec
[error] at issue.IssueTest.testMyCollection(IssueTest.scala:20)
[error] ...
In other words, I don't get to the assertion, but instead it's failing at casting in the deserialization which deserialized $colon$colon
instead of MyCollection
. This looks to be a different problem than #9237.
expectation
Either the serialization works out of the box, or MyCollection
does not compile without providing some serialization mechanism.
workaround
A workaround identified by Yoshida-san is uncommenting the following:
// protected[this] override def writeReplace(): AnyRef = this
note
scala/scala#6676 makes Iterable
Serializable by default.
trait Iterable[+A] extends IterableOnce[A] with IterableOps[A, Iterable, Iterable[A]] with Serializable {
with writeReplace
implemented as follows:
protected[this] def writeReplace(): AnyRef = new DefaultSerializationProxy(iterableFactory.iterableFactory, this)
In other words, the serialization of all things Iterable are passed into DefaultSerializationProxy
, including all subtypes that exist in the wild. Perhaps it should fail at the point of serialization when it detects a type that it cannot handle.
Letting it serialize the data, but not deserialize sounds like a potentially data-losing behavior. Another thing to consider is forcing subclasses of Iterable
to implement a serialization method. The situation where it's easy to roll your own collection, but it will blow up on Spark by default is not a happy experience.