Skip to content

Setting X-Opaque-ID header for all reads and writes for MapReduce and Spark #1770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 5, 2021

Conversation

masseyke
Copy link
Member

@masseyke masseyke commented Oct 1, 2021

This commit adds a X-Opaque-ID header when contacting Elasticsearch for es-hadoop or es-spark.
Relates #1182

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@masseyke masseyke merged commit 5e84afe into elastic:master Oct 5, 2021
@masseyke masseyke deleted the feature/set-x-opaque-id-8x branch October 5, 2021 18:44
masseyke added a commit to masseyke/elasticsearch-hadoop that referenced this pull request Oct 7, 2021
… spark (elastic#1770)

This commit adds a X-Opaque-ID header when communicating with Elasticsearch from es-hadoop or es-spark.
masseyke added a commit that referenced this pull request Oct 7, 2021
… spark (#1770) (#1771)

This commit adds a X-Opaque-ID header when communicating with Elasticsearch from es-hadoop or es-spark.
Relates #1182
@masseyke
Copy link
Member Author

masseyke commented Dec 3, 2021

As an example, here are the X-Opaque-ID headers that you would see from a simple mapreduce job named Sample Job that uses EsInputFormat to read all of the books index, and then EsOutputFormat in a reducer to write it all back out to the books index. Each line below shows the X-Opaque-ID header and the URL being hit:

[[mapreduce] [] [Sample Job] []] /
[[mapreduce] [hduser] [Sample Job] []] /
[[mapreduce] [hduser] [Sample Job] []] /books
[[mapreduce] [hduser] [Sample Job] []] /_cluster/health/books
[[mapreduce] [hduser] [Sample Job] []] /_nodes/http
[[mapreduce] [hduser] [Sample Job] []] /_nodes/http
[[mapreduce] [hduser] [Sample Job] []] /books
[[mapreduce] [hduser] [Sample Job] []] /books/_search_shards
[[mapreduce] [hduser] [Sample Job] []] /books/_mapping
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_m_000000_0]] /books/_search
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_m_000000_0]] /_search/scroll
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_m_000000_0]] /_search/scroll
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /_nodes/http
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /_nodes/http
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /_all/_alias/books
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /books
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /books/_search_shards
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /_nodes/http
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /books/_bulk
[[mapreduce] [hduser] [Sample Job] [attempt_local61453229_0001_r_000000_0]] /books/_refresh

Here is the spark equivalent, run from spark shell:

sc.esJsonRDD("books").saveAsSequenceFile("/keith/books-seq-9")
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /books
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /_cluster/health/books
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /books
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /books/_search_shards
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /books/_mapping
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 0] [task attempt 0]] /books/_search
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 0] [task attempt 0]] /_search/scroll
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 0] [task attempt 0]] /_search/scroll

sc.sequenceFile("/keith/books-seq", classOf[Text], classOf[Text]).saveToEs("books-restore12")
[[spark] [hduser] [Spark shell] [application_1631651348162_0036]] /
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /_all/_alias/books-restore12
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /books-restore12
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /books-restore12/_search_shards
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /_nodes/http
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /books-restore12/_bulk
[[spark] [hduser] [Spark shell] [application_1631651348162_0036] [stage 1] [task attempt 1]] /books-restore12/_refresh

@jrodewig jrodewig changed the title Setting X-Opaque-ID header for all reads and writes for mapreduce and spark Setting X-Opaque-ID header for all reads and writes for MapReduce and Spark Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants