Open
Description
What kind an issue is this?
- Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved. - Feature Request. Start by telling us what problem you’re trying to solve.
Often a solution already exists! Don’t send pull requests to implement new features without
first getting our support. Sometimes we leave features out on purpose to keep the project small.
Issue description
When read a Hive table stored by EsStorageHandler using Spark SQL, java.lang.ClassCastException is thrown.
Steps to reproduce
- Create index
test_column
in Elasticsearch. - Create Hive table
test_column
in hive-cli:
CREATE EXTERNAL TABLE `test_column`(
`id` bigint,
`name` string,
`age` bigint,
`email` string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'test_column'
, 'es.nodes'='localhost'
, 'es.port'='9200'
, 'es.read.metadata' = 'true'
, 'es.mapping.id'='id'
, 'es.mapping.names'='id:id, name:name, age:age, email:email'
, 'es.index.read.missing.as.empty'='true');
- Run SQL
select * from test_column
using Spark SQL
Strack trace:
22/08/30 21:36:16 ERROR SparkSQLDriver: Failed in [select * from test_column]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (10.219.184.72 executor driver): java.lang.ClassCastException: org.elasticsearch.hadoop.hive.EsHiveInputFormat$EsHiveSplit cannot be cast to org.elasticsearch.hadoop.mr.EsInputFormat$EsInputSplit
at org.elasticsearch.hadoop.mr.EsInputFormat$EsInputRecordReader.initialize(EsInputFormat.java:144)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:220)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:217)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:172)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1469)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Version Info
Spark: 3.1.2
ES-Hadoop : 8.3.3
ES : 8.4.0