Skip to content

Update client to support a different project_id to run jobs #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2018

Conversation

tuanavu
Copy link
Contributor

@tuanavu tuanavu commented Mar 15, 2018

There is a use case where clients want to authenticate and enable billing to 1 project but run all the async query, load, export, delete jobs in a different project if specified. The default is self.project_id.

tuanavu added 2 commits March 14, 2018 16:48
This supports authenticate to 1 project_id but run jobs in a different project_id.
job_reference: json of job_reference
"""
job_reference = {
"projectId": self.project_id,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the overridden project_id if it's passed into the function?

Copy link
Contributor Author

@tuanavu tuanavu Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @tylertreat,

Correct, it just override a different project_id if it's passed other than the self.project_id. This is to resolve the problem when the service account does not have project permission, which means it cannot authenticate to the project. There is a use case at my company like this:

  • a service account is created, but all jobs are submitted to a default billing project_id.
  • This service account does not have project permission for other projects due to many sensitive datasets. In this case, the service account cannot authenticate to those projects. So the service account only has permission to access only a few datasets in that project if the service account is added to those datasets with view or edit permisson.
  • However, Google supports this use case which is the job can be submitted to a billing project for billing purposes and tracking jobs, but it can process data in another dataset in another project which it is shared.

In short, this resolves the use case where service account can submit and run jobs even in another project that it does not have project permission but only dataset permission.

Hope it makes sense. I know it is a bit complicated, but I know there are a few companies have these restrictions to protect their data. Especially if you work with third parties data where they only share the dataset with you. But in other to run the jobs, you have to submit jobs to a different billing project_id.

Best,
Tuan

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I understand the overall changes and it makes sense to me. I was specially referring to _get_job_reference and why it's hardcoded to use self.project_id instead of passing in a project_id and calling _get_project_id(project_id) like the rest of your changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, because the _insert_job always submit jobs to self.project_id, which is the billing project. Then jobReference.projectId will always be self.project_id. That is why _get_job_reference is hardcoded using the self.project_id. Hope it makes sense!

@tylertreat tylertreat merged commit e8dc6ec into tylertreat:master Mar 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants