Skip to content

Add python3 support for HiveOperator #17

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 17 additions & 9 deletions HiveOperator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,12 @@ HiveOperator [(Source code)](https://github.com/SAP/datahub-integration-examples
------------
This operator provides functionality to query a Hive Metastore server using a HiveQL string and returns a response in the format of a delimited string.

The operator runs on a custom Docker images that extends the SAP-deliver docker image `com.sap.python2.7` and uses the Kerberos client binary `krb5-user` as well as `libsasl2` for Ubuntu. The PyHive python module is developed and maintained by Dropbox: https://github.com/dropbox/PyHive
Two implementations are provided for this operator based on different versions of python.
- For python2, it runs on a custom Docker images that extends the SAP-deliver docker image `com.sap.python2.7`.
- For python3, it runs on a custom Docker images that extends the python3.6 docker image.
> Since SAP Data Intelligence version3.0, it only supports python3.

Both use the Kerberos client binary `krb5-user` as well as `libsasl2` for Ubuntu. The PyHive python module is developed and maintained by Dropbox: https://github.com/dropbox/PyHive

![alt text](./graph.jpg "Graph")

Expand All @@ -25,32 +30,35 @@ Before you start using the example, please make sure that:
- Install Kerberos client libraries

**2. Custom operator 'HiveOperator'**
- Derived from Pythin20Operator
- Derived from Python20Operator
- Uses image tags `python27:""` and `pyhive:pip2`
- **input port `inSql` of type string:** expects a single HiveQL-compliant string without a semicolon
- **output port `output` of type string:** outputs the response from the Hive Metastore server, columns are delimited by a comma (default) but can be overriden using the `delimiter` configuration parameter (See description below)

**3. Sample graph HiveOperator_test**
- Provides an interactive terminal to query a Hive Metastore server and display the results. Note, the HiveOperator can only process one HiveQL statement at a time.

> Python3 implementation uses the same steps only diff in the base image and operator tags.


## How to run
- Import [solution/HiveOperator-1.0.tgz](solution/HiveOperator-1.0.tgz) via `SAP Data Hub System Management` -> `Files` -> `Import Solution`
- For python2, import [solution/py2/HiveOperator-1.0.1.tar.gz](solution/py2/HiveOperator-1.0.1.tar.gz) via `SAP Data Hub System Management` -> `Files` -> `Import Solution`
- For python3, import [solution/py3/HiveOperator-1.1.0.tar.gz](solution/py3/HiveOperator-1.1.0.tar.gz) via `SAP Data Hub System Management` -> `Files` -> `Import Solution`
- Run the `Graph` -> `examples.HiveOperator_test`

**Operator configuration parameters**

database: Specify which database in Hive metastore to connect to
delimiter: Used to separate columns in HiveOperator output e.g. 1.34;Hello;World;
database: Specify which database in Hive metastore to connect to
delimiter: Used to separate columns in HiveOperator output e.g. 1.34;Hello;World;
hive_hostname: Hostname or IP address to Hive Metastore server
hive_port: Port used by Hive Metastore server
http_mode: If hive.server2.transport.mode is set to http, set this parameter to true
hive_port: Port used by Hive Metastore server
http_mode: If hive.server2.transport.mode is set to http, set this parameter to true
kerberos_enabled: If Hive cluster is kerberized set to true and read additional notes below
kerberos_keytab_filename: The file name of the uploaded keytab file (case sensetive)
kerberos_principal: Kerberos principal used with uploaded keytab file
kerberos_realm: Kerberos realm used with principal and keytab file
username: Username for plain authentication
password: Password for plain authentication
username: Username for plain authentication
password: Password for plain authentication

**Kerberos configuration**
(Optional) Upload .keytab and krb5.conf file via the HiveOperator designer. These will be copied into the docker container at runtime. Remember to specify the kerberos realm and principal name in the operator's configuration section when designing your graph.
Expand Down
Binary file not shown.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
FROM python:3.6.4-slim-stretch

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
apt install -y python3-pip && \
apt-get install -y python3-dev && \
apt-get install -y krb5-user && \
apt-get install -y libsasl2-dev && \
apt-get install -y libsasl2-modules-gssapi-mit && \
mkdir /keytabs

# Install python libraries
RUN pip3 install pyhive[hive]
RUN pip3 install tornado==5.0.2


# Add vflow user and vflow group to prevent error
# container has runAsNonRoot and image will run as root
RUN groupadd -g 1972 vflow && useradd -g 1972 -u 1972 -m vflow
USER 1972:1972
WORKDIR /home/vflow
ENV HOME=/home/vflow
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"pyhive": "pip3",
"python36": "",
"tornado": "5.0.2"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"properties": {},
"description": "Hive Operation",
"processes": {
"terminal1": {
"component": "com.sap.util.terminal",
"metadata": {
"label": "Terminal",
"x": 241.99999904632568,
"y": 39.99999952316284,
"height": 80,
"width": 120,
"ui": "dynpath",
"subengines": [
"main"
],
"config": {}
}
},
"hiveoperator1": {
"component": "examples.HiveOperator",
"metadata": {
"label": "hiveOperator",
"x": 72.99999904632568,
"y": 39.99999952316284,
"height": 80,
"width": 120,
"extensible": true,
"config": {}
}
}
},
"groups": [],
"connections": [
{
"metadata": {
"points": "196.99999904632568,79.99999952316284 236.99999904632568,79.99999952316284"
},
"src": {
"port": "output",
"process": "hiveoperator1"
},
"tgt": {
"port": "in1",
"process": "terminal1"
}
},
{
"metadata": {
"points": "365.9999990463257,79.99999952316284 393.9999985694885,79.99999952316284 393.9999985694885,12 12,12 12,79.99999952316284 67.99999904632568,79.99999952316284"
},
"src": {
"port": "out1",
"process": "terminal1"
},
"tgt": {
"port": "inSql",
"process": "hiveoperator1"
}
}
],
"inports": {},
"outports": {}
}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading