Open
Description
When using SparkSQL, if a user tries to write a dataframe to Elasticsearch, it will fail if the es.input.json
property is set to true
.
When writing a Dataframe that would not be compatible with the write mode, the error message ends up as something like:
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('(' (code 40)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: [B@25a91b8c; line: 1, column: 2]
or if using a Dataset[String] (which would reasonably be expected to work), the exception looks like:
org.elasticsearch.hadoop.rest.EsHadoopRemoteException: mapper_parsing_exception: failed to parse;org.elasticsearch.hadoop.rest.EsHadoopRemoteException: not_x_content_exception: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
{"index":{}}
([{"id": "id3", "email": "email1@gmail.com"}],StructType(StructField(value,StringType,true)))
The preferred solution here is to simply use a Spark RDD of type String instead, but it would be best if ES-Hadoop supported Dataset[String] for writing json data, checking for other incompatible situations and throwing appropriate errors.
relates #1280