Skip to content

[Discussion] Add initial design tenets/goals for S3 TransferManager #1120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 16, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/design/services/s3/transfermanager/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Project Tenets (unless you know better ones)

1. Meeting customers in their problem space allows them to deliver value
quickly.
2. Meeting customer expectations drives usability.
3. Discoverability drives usage.

# Introduction

This project provides a much improved experience for S3 customers needing to
easily perform uploads and downloads of objects to and from S3 by providing the
S3 `S3TransferManager`, a high level library built on the S3 client.

# Project Goals

1. For the use cases it addresses, i.e. the transfer of objects to and from S3,
S3TransferManager is the preferred solution. It is easier and more intuitive
than using the S3 client. In the majority of situations, it is more
performant.
1. S3TransferManager provides a truly asynchronous, non-blocking API that
conforms to the norms present in the rest of the SDK.
1. S3TransferManager makes efficient use of system resources.
1. S3TransferManager supplements rather than replaces the lower level S3 client.

# Non Project Goals

1. Ability to use the blocking, synchronous client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where you separate client vs http client but I think it should be a primary goal to support sync http clients like Apache. Each HTTP client has different levels of configuration and allowing customers to use their preferred one should be a requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess both as in the blocking service client and HTTP clients. I can't see supporting a sync HTTP client as a goal can coexist with the async/non-blocking, efficiency, and performance goals. We already saw a lot of this when we explored whether we could expose sync HTTP clients as async and vice versa.

To me, a big part of the original TransferManager was to provide the async/non-blocking experience so by focusing on using the async HTTP clients, we further enhance that experience.

If it's a matter of configuration I would think supporting a config present in Apache but not Netty would be easier to achieve than supporting Apache/a non blocking HTTP client in TransferManager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just limiting it to non-blocking clients is fine, as long as we have support for HTTP proxies in Netty. Sync clients and async abstractions like TransferManager don't mix well.

Copy link
Contributor

@spfink spfink Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So no shared connection pool if customers are using the sync s3 client? Which is made worse given that you have to use both the low-level client and transfer manager.

No lightweight JDK client?

No GAE support?

A worse debugging experience?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, TransferManager is an enhancement of the async client, where this is all true right now as well.

I think it's important that customers who can't use the async clients for the reasons given above are not left without an option, but I think that means we provide a similar, but separate library for non-async clients.


Using a blocking client would severely impede the ability to deliver on goals
#2 and #3.

# Customer-Requested Changes from 1.11.x

* S3TransferManager supports progress listeners that are easier to use.

Ref: https://github.com/aws/aws-sdk-java-v2/issues/37#issuecomment-316218667
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a meta-question: do we know yet whether these will be S3-specific, or built into the core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense for them to be same thing. Or at least, TransferManager supports both the core and an "enhanced" TransferManager specific version if it makes sense to have one.


* S3TransferManager provides bandwidth limiting of uploads and downloads.

Ref: https://github.com/aws/aws-sdk-java/issues/1103

* The size of resources used by Transfermanager and configured by the user
should not affect its stability.

For example, the configured size of a threadpool should be irellevant to its
ability to successfuly perform an operation.

Ref: https://github.com/aws/aws-sdk-java/issues/939

* S3TransferManager supports parallel downloads of any object.

Any object stored in S3 should be downloadable in multiple parts
simultaneously, not just those uploaded using the Multipart API.

* S3TransferManager has the ability to upload to and download from a pre-signed
URL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include parallel downloads (see previous goal)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For downloads definitely. I wasn't sure if it's possible for uploads so I didn't want to specify parallel in case it's not. I'll confirm and then update the wording if necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Parallel downloads are possible, but there might be edge cases where we can't compute object length and hence can't do parallel downloads
  • Multi-part uploads are possible but its complex and requires customers to provide multiple presigned urls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I'd probably just leave it as is for now as is for now, or add a caveat that parallel parallel downloads where possible.


* S3TransferManager allows uploads and downloads from and to memory.

Ref: https://github.com/aws/aws-sdk-java/issues/474

* Ability to easily use canned ACL policies with all transfers to S3.

Ref: https://github.com/aws/aws-sdk-java/issues/1207

* Trailing checksums for parallel uploads and downloads.