Setting X-Opaque-ID header for all reads and writes for MapReduce and Spark #1770

masseyke · 2021-10-01T18:39:26Z

This commit adds a X-Opaque-ID header when contacting Elasticsearch for es-hadoop or es-spark.
Relates #1182

… spark

jbaiera

LGTM!

… spark (elastic#1770) This commit adds a X-Opaque-ID header when communicating with Elasticsearch from es-hadoop or es-spark.

… spark (#1770) (#1771) This commit adds a X-Opaque-ID header when communicating with Elasticsearch from es-hadoop or es-spark. Relates #1182

masseyke · 2021-12-03T16:44:46Z

As an example, here are the X-Opaque-ID headers that you would see from a simple mapreduce job named Sample Job that uses EsInputFormat to read all of the books index, and then EsOutputFormat in a reducer to write it all back out to the books index. Each line below shows the X-Opaque-ID header and the URL being hit:

[[mapreduce] [] [Sample Job] []] /
[[mapreduce] [hduser] [Sample Job] []] /
[[mapreduce] [hduser] [Sample Job] []] /books
[[mapreduce] [hduser] [Sample Job] []] /_cluster/health/books
[[mapreduce] [hduser] [Sample Job] []] /_nodes/http
[[mapreduce] [hduser] [Sample Job] []] /_nodes/http
[[mapreduce] [hduser] [Sample Job] []] /books
[[mapreduce] [hduser] [Sample Job] []] /books/_search_shards
[[mapreduce] [hduser] [Sample Job] []] /books/_mapping
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_m_000000_0]] /books/_search
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_m_000000_0]] /_search/scroll
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_m_000000_0]] /_search/scroll
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /_nodes/http
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /_nodes/http
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /_all/_alias/books
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /books
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /books/_search_shards
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /_nodes/http
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /books/_bulk
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /books/_refresh

Here is the spark equivalent, run from spark shell:

sc.esJsonRDD("books").saveAsSequenceFile("/keith/books-seq-9")
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /books
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /_cluster/health/books
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /books
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /books/_search_shards
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /books/_mapping
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 0] [task attempt 0]] /books/_search
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 0] [task attempt 0]] /_search/scroll
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 0] [task attempt 0]] /_search/scroll

sc.sequenceFile("/keith/books-seq", classOf[Text], classOf[Text]).saveToEs("books-restore12")
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /_all/_alias/books-restore12
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /books-restore12
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /books-restore12/_search_shards
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /books-restore12/_bulk
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /books-restore12/_refresh

Seeting X-Opaque-ID header for all reads and writes for mapreduce and…

ede91c3

… spark

masseyke requested a review from jbaiera October 1, 2021 18:39

masseyke mentioned this pull request Oct 1, 2021

Set X-Opaque-ID header in request header #1755

Closed

jbaiera approved these changes Oct 5, 2021

View reviewed changes

masseyke merged commit 5e84afe into elastic:master Oct 5, 2021

masseyke deleted the feature/set-x-opaque-id-8x branch October 5, 2021 18:44

masseyke added backport pending v8.0.0-rc2 v7.16.0 labels Oct 7, 2021

masseyke removed the backport pending label Oct 7, 2021

jakelandis added v8.0.0-beta1 and removed v8.0.0-rc2 labels Oct 27, 2021

masseyke added the feature label Dec 3, 2021

jrodewig changed the title ~~Setting X-Opaque-ID header for all reads and writes for mapreduce and spark~~ Setting X-Opaque-ID header for all reads and writes for MapReduce and Spark Dec 3, 2021

masseyke mentioned this pull request Jan 19, 2022

X-Opaque-ID header can cause "No colon found" in Hive #1872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Setting X-Opaque-ID header for all reads and writes for MapReduce and Spark #1770

Setting X-Opaque-ID header for all reads and writes for MapReduce and Spark #1770

Uh oh!

masseyke commented Oct 1, 2021 •

edited

Loading

Uh oh!

jbaiera left a comment

Uh oh!

masseyke commented Dec 3, 2021

Uh oh!

Uh oh!

Setting X-Opaque-ID header for all reads and writes for MapReduce and Spark #1770

Setting X-Opaque-ID header for all reads and writes for MapReduce and Spark #1770

Uh oh!

Conversation

masseyke commented Oct 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbaiera left a comment

Choose a reason for hiding this comment

Uh oh!

masseyke commented Dec 3, 2021

Uh oh!

Uh oh!

masseyke commented Oct 1, 2021 •

edited

Loading