Skip to content

[maskedtensor] Add overview tutorial #2042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not use stacks, just submit a regular PR (which otherwise looks good to me)

Copy link

@jisaacso jisaacso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a noob, I'm pretty confused by the focus of this on comparisons to Numpy.

IMO, the first overview tutorial should not speak of other libraries. There can be a whole separate tutorial covering "comparisons to numpy". It's just too confusing for a new user to think about alternative semantics that they don't need.

I would personally pull out all of the numpy conversation into a separate tutorial, and focus this one on building up knowledge: what is a MT, how are they instantiated, How are the visualized, how do you do reductions, how do you do combine MTs via ops like sum, and then you can touch on the requirement that masks have the same shape (and that you can get around this with an &). If you want, you can end the whole tutorial with a single sentence, "for those familiar with numpy, we recognize the semantics deviate from Numpy's. Details on why are addressed in "

Indexing and slicing
--------------------

:class:`MaskedTensor` is a Tensor subclass, which means that it inherits the same semantics for indexing and slicing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write a docstring on MaskedTensor?

Init signature: MaskedTensor(data, mask, requires_grad=False)
Docstring:      <no docstring>

:class:`MaskedTensor` is a Tensor subclass, which means that it inherits the same semantics for indexing and slicing
as :class:`torch.Tensor`. Below are some examples of common indexing and slicing patterns:

>>> data = torch.arange(60).reshape(3, 4, 5)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you copy the output of this call and the mask call below as well? It helps the user see the original data before masking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Semantics
+++++++++

MaskedTensor vs NumPy's MaskedArray

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this can be moved to the end? The flow of the tutorial should be:

what is a MT
How do you access/slice
What operations can you do / reductions
How does this compare with alternatives to masked tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - this will include "what is a MT and why is an MT useful"

Reduction semantics
-------------------

The basis for reduction semantics `has been documented and discussed at length <https://github.com/pytorch/rfcs/pull/27>`__,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this can be rephrased.

The purpose of a tutorial is to be a self contained, helpful, end to end showcase of a new feature. It is not assumed that the reader has seen RFCs.

I would start off explaining what a reduction semantic is (we ignore masked values for your favorite functions!), then give the code examples before, then conclude by saying "for more details on the basis for reduction semantics, see this RFC".

Copy link

@jisaacso jisaacso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding one more comment


>>> t0 = mt0.to_tensor(0)
>>> t1 = mt1.to_tensor(0)
>>> mt2 = masked_tensor(t0 + t1, mt0.get_mask() & mt1.get_mask())
Copy link

@jisaacso jisaacso Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why & instead of |?

It feels like to me, that combining two masks should be the union operator of those two masks, not the &?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumPy uses logical_or and our mask is the inverse of theirs, so ours is & if you want the same behavior


>>> t0 = mt0.to_tensor(0)
>>> t1 = mt1.to_tensor(0)
>>> mt2 = masked_tensor(t0 + t1, mt0.get_mask() & mt1.get_mask())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to union two masks and return another mask, without returning a boolean? Right now

>>> mt0.get_mask() & mt1.get_mask()
tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

# it would be nice if union ops on two raw MTs returned a MT
>>> mt0 | mt1
MaskedTensor(
  [
    [  --,       --,       --,       --],
    [      --,   --,       --,       --],
    [      --,   --,       --,       --]
  ]
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could add this in!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you tracking features anywhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants