Skip to content

Make BTrDB connections serializable in Ray #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gtfierro
Copy link

@gtfierro gtfierro commented May 20, 2021

This PR adds a new shareable flag to the connect() method. Connections created in this way can be serialized and shared among Ray workers in a Ray cluster. However, this is potentially insecure: it is possible (POC in development) to pull an API key out of the serialized object as it rests in the Plasma store. However, this requires access to the same Ray cluster.

Uses the semver package to choose the right Ray serialization method based on which version of Ray is installed.

Usage looks as follows:

import ray
import btrdb
ray.init(ignore_reinit_error=True)
conn  =  btrdb.connect(profile='collab', shareable=True)
# /opt/conda/lib/python3.7/site-packages/btrdb/__init__.py:81: UserWarning: a shareable connection is potentially insecure; other users of the same cluster may be able to access your API key
conn_copy = ray.get(ray.put(conn))  #ok
import ray
import btrdb
ray.init(ignore_reinit_error=True)
conn  =  btrdb.connect(profile='collab', shareable=False)
conn_copy = ray.get(ray.put(conn))  # raises Serialization Exception

Clubhouse story [ch9329]

@shortcut-integration
Copy link

This pull request has been linked to Clubhouse Story #9329: Update btrdb-python for Ray 1.0 compatibility.

@mchestnut91
Copy link

This all looks good to me. I verified that it works with this dummy function:

import ray
import btrdb

@ray.remote
def get_info(i, conn):
    return f"{i} {conn.info()}"

ray.init(ignore_reinit_error=True)
conn = btrdb.connect(profile="collab", shareable=True)

infos = [get_info.remote(i, conn) for i in range(3)]
print(ray.get(infos)) # prints connection info 3 times

@mchestnut91 mchestnut91 self-requested a review May 27, 2021 16:58
Copy link

@mchestnut91 mchestnut91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks for figuring this out!

@RamiroCope
Copy link

Just took it for a spin and was able to get data out. Setting shareable=False as default also makes it not break Data Science's code base, which is nice.

@RamiroCope RamiroCope merged commit 776731b into PingThingsIO:master May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants