Skip to content

SparkSQL write operations fail when using raw JSON #1303

Open
@jbaiera

Description

@jbaiera

When using SparkSQL, if a user tries to write a dataframe to Elasticsearch, it will fail if the es.input.json property is set to true.

When writing a Dataframe that would not be compatible with the write mode, the error message ends up as something like:

org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('(' (code 40)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: [B@25a91b8c; line: 1, column: 2]

or if using a Dataset[String] (which would reasonably be expected to work), the exception looks like:

org.elasticsearch.hadoop.rest.EsHadoopRemoteException: mapper_parsing_exception: failed to parse;org.elasticsearch.hadoop.rest.EsHadoopRemoteException: not_x_content_exception: Compressor detection can only be called on some xcontent bytes or compressed xcontent bytes
	{"index":{}}
([{"id": "id3", "email": "email1@gmail.com"}],StructType(StructField(value,StringType,true)))

The preferred solution here is to simply use a Spark RDD of type String instead, but it would be best if ES-Hadoop supported Dataset[String] for writing json data, checking for other incompatible situations and throwing appropriate errors.

relates #1280

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions