Skip to content

Add torchmultimodal tutorial for flava finetuning #2054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Oct 27, 2022
Merged

Add torchmultimodal tutorial for flava finetuning #2054

merged 35 commits into from
Oct 27, 2022

Conversation

ankitade
Copy link
Contributor

@ankitade ankitade commented Sep 24, 2022

Adding first tutorial for torchmultimodal around how to finetune FLAVA for vqa.

@netlify
Copy link

netlify bot commented Sep 24, 2022

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit 158e289
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/6359bafa8c621c0008aaef28
😎 Deploy Preview https://deploy-preview-2054--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.


######################################################################
# Installations
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#
# -----------------
#

# Installations
#
# We will use TextVQA dataset from HuggingFace for this
# tutorial. So we install datasets in addition to TorchMultimodal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# tutorial. So we install datasets in addition to TorchMultimodal
# tutorial. We install datasets in addition to TorchMultimodal.

Svetlana Karslioglu and others added 2 commits September 26, 2022 10:53
@ankitade ankitade closed this Sep 26, 2022
#

!wget http://dl.fbaipublicfiles.com/pythia/data/vocab.tar.gz
!tar xf vocab.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added this to the Makefile above I believe


# TODO: replace with install from pip when binary is ready
!git clone https://github.com/facebookresearch/multimodal.git
!pip install -r multimodal/requirements.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should go to requirements.txt - I see only two items in that file - you can add them to the requirements.txt

sys.path.append(os.path.join(os.getcwd(),"multimodal"))
sys.path.append(os.getcwd())
!pip install datasets
!pip install transformers
Copy link
Contributor

@svekars svekars Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this instead of lines 30 - 34:

# .. note::
#
#    When running this tutorial in Google Colab, install the required packages by
#    creating a new cell and running the following commands:
#
#    .. code-block::
#
#       !pip install torchmultimodal-nightly
#       !pip install datasets
#       !pip install transformers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we want it to be present in the notebook

!tar xf vocab.tar.gz


with open("vocabs/answers_textvqa_more_than_1.txt") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go to where you have downloaded your data - probably 'data/'

@svekars svekars reopened this Sep 26, 2022
requirements.txt Outdated
@@ -45,3 +48,7 @@ wget
gym==0.24.0
gym-super-mario-bros==7.3.0
timm

# flava tutorial - multimodal
packaging

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need packaging anymore. I removed it in this PR.

# which is a multimodal model for object detection and
# `Omnivore <https://github.com/facebookresearch/multimodal/blob/main/torchmultimodal/models/omnivore.py>`__
# which is multitask model spanning image, video and 3d classification.
#

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can add it in follow up PR

Copy link
Contributor

@subramen subramen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some feedback for more detail, and some suggested changes

for _ in range(epochs):
for idx, batch in enumerate(train_dataloader):
optimizer.zero_grad()
out = model(text = batch["input_ids"], image = batch["image"], labels = batch["answers"], required_embedding="mm")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the required_embedding arg doing? It is not as obvious as the other params, maybe add a note in the plaintext above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, its not required

for _ in range(epochs):
for idx, batch in enumerate(train_dataloader):
optimizer.zero_grad()
out = model(text = batch["input_ids"], image = batch["image"], labels = batch["answers"], required_embedding="mm")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need retraining the encoders too, or just the head?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it finetunes the encoders as well

@svekars svekars added the 1.13 label Oct 10, 2022
Svetlana Karslioglu and others added 2 commits October 11, 2022 10:15
Co-authored-by: Nikita Shulga <nshulga@fb.com>
@svekars svekars changed the base branch from master to 1.13-RC-TEST October 13, 2022 18:59
@svekars
Copy link
Contributor

svekars commented Oct 18, 2022

@ankitade what's the status on this? Can you resolve the merge conflict?

@svekars svekars requested a review from subramen October 18, 2022 18:52
@svekars svekars marked this pull request as ready for review October 20, 2022 17:19
@ankitade ankitade changed the title [WIP] Add torchmultimodal tutorial for flava finetuning Add torchmultimodal tutorial for flava finetuning Oct 21, 2022
Copy link

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple nits

# end examples, aiming to enable and accelerate research in
# multimodality**.
#
# In this tutorial, we will demonstrate how to use a **pretrained SoTA

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we just say state-of-the-art here?

# TorchMultimodal library to finetune on a multimodal task i.e. visual
# question answering** (VQA). The model consists of two unimodal transformer
# based encoders for text and image and a multimodal encoder to combine
# the two embeddings. It is pretrained using contrastive, image text matching and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the losses be enumerated in a different way here? I feel the comma placement makes this kinda confusing

@svekars svekars changed the base branch from 1.13-RC-TEST to master October 26, 2022 15:44
requirements.txt Outdated
@@ -27,6 +26,9 @@ pytorch-lightning
torchx
ax-platform
nbformat>=4.2.0
datasets
transformers
torchmultimodal-nightly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be updated to use stable?

@svekars svekars merged commit 5185031 into master Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants