-
+
- - {% for post in posts %} -
-
-

{{ post.date | date: '%B %d, %Y' }}

-

- {{ post.title }} -

-

{{ post.excerpt | remove: '

' | remove: '

' | truncate: 500}}

- -
- - Read More - -
- {% endfor %} + {% include blog_tags_filter.html %} + +
+ {% for post in posts %} + {% include blog_post_nav.html %} + {% endfor %} +
{% include pagination_buttons.html %} diff --git a/_posts/2022-3-16-running-pytorch-models-on-jetson-nano.md b/_posts/2022-3-16-running-pytorch-models-on-jetson-nano.md index 25e3c905943b..8485730f8f15 100644 --- a/_posts/2022-3-16-running-pytorch-models-on-jetson-nano.md +++ b/_posts/2022-3-16-running-pytorch-models-on-jetson-nano.md @@ -1,20 +1,26 @@ --- layout: blog_detail -title: 'Running PyTorch Models on Jetson Nano' +title: "Running PyTorch Models on Jetson Nano" author: Jeff Tang, Hamid Shojanazeri, Geeta Chauhan -featured-img: 'assets/images/pytorch-logo.jpg' +featured-img: "assets/images/pytorch-logo.jpg" +tags: + - tag5 + - tag6 + - tag7 --- ### Overview + NVIDIA [Jetson Nano](https://developer.nvidia.com/embedded/jetson-nano-developer-kit), part of the [Jetson family of products](https://developer.nvidia.com/embedded/jetson-modules) or Jetson modules, is a small yet powerful Linux (Ubuntu) based embedded computer with 2/4GB GPU. With it, you can run many PyTorch models efficiently. This document summarizes our experience of running different deep learning models using 3 different mechanisms on Jetson Nano: - 1. Jetson Inference the higher-level NVIDIA API that has built-in support for running most common computer vision models which can be transfer-learned with PyTorch on the Jetson platform. +1. Jetson Inference the higher-level NVIDIA API that has built-in support for running most common computer vision models which can be transfer-learned with PyTorch on the Jetson platform. - 2. TensorRT, an SDK for high-performance inference from NVIDIA that requires the conversion of a PyTorch model to ONNX, and then to the TensorRT engine file that the TensorRT runtime can run. +2. TensorRT, an SDK for high-performance inference from NVIDIA that requires the conversion of a PyTorch model to ONNX, and then to the TensorRT engine file that the TensorRT runtime can run. - 3. PyTorch with the direct PyTorch API `torch.nn` for inference. +3. PyTorch with the direct PyTorch API `torch.nn` for inference. ### Setting up Jetson Nano + After purchasing a Jetson Nano [here](https://developer.nvidia.com/buy-jetson?product=jetson_nano&location=US), simply follow the clear step-by-step [instructions](https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit) to download and write the Jetson Nano Developer Kit SD Card Image to a microSD card, and complete the setup. After the setup is done and the Nano is booted, you’ll see the standard Linux prompt along with the username and the Nano name used in the setup. To check the GPU status on Nano, run the following commands: @@ -69,7 +75,8 @@ nvgstcapture-1.0 --orientation=2 ``` ### Using Jetson Inference -NVIDIA [Jetson Inference](https://github.com/dusty-nv/jetson-inference) API offers the easiest way to run image recognition, object detection, semantic segmentation, and pose estimation models on Jetson Nano. Jetson Inference has TensorRT built-in, so it’s very fast. + +NVIDIA [Jetson Inference](https://github.com/dusty-nv/jetson-inference) API offers the easiest way to run image recognition, object detection, semantic segmentation, and pose estimation models on Jetson Nano. Jetson Inference has TensorRT built-in, so it’s very fast. To test run Jetson Inference, first clone the repo and download the models: @@ -104,13 +111,11 @@ $ ls -lt ~/jetson_inference/ -rw-r--r-- 1 root root 179760 Oct 15 21:29 jellyfish.jpg ``` -
Using jest interface example 1 Using jest interface example 2
-
Using jest interface example 3 Using jest interface example 4 @@ -128,8 +133,8 @@ torchvision (0.10.0a0+300a8a4) Although Jetson Inference includes models already converted to the TensorRT engine file format, you can fine-tune the models by following the steps in Transfer Learning with PyTorch (for Jetson Inference) [here](https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-transfer-learning.md). ### Using TensorRT -[TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/) is an SDK for high-performance inference from NVIDIA. Jetson Nano supports TensorRT via the Jetpack SDK, included in the SD Card image used to set up Jetson Nano. To confirm that TensorRT is already installed in Nano, `run dpkg -l|grep -i tensorrt`: +[TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/) is an SDK for high-performance inference from NVIDIA. Jetson Nano supports TensorRT via the Jetpack SDK, included in the SD Card image used to set up Jetson Nano. To confirm that TensorRT is already installed in Nano, `run dpkg -l|grep -i tensorrt`:
@@ -139,7 +144,7 @@ Theoretically, TensorRT can be used to “take a trained PyTorch model and optim 1. How to convert the model from PyTorch to ONNX; -2. How to convert the ONNX model to a TensorRT engine file; +2. How to convert the ONNX model to a TensorRT engine file; 3. How to run the engine file with the TensorRT runtime for performance improvement: inference time improved from the original 31.5ms/19.4ms (FP32/FP16 precision) to 6.28ms (TensorRT). @@ -147,7 +152,7 @@ You can replace the Resnet50 model in the notebook code with another PyTorch mod `Error Code 1: Cuda Runtime (all CUDA-capable devices are busy or unavailable)` -You may also see an error when converting a PyTorch model to ONNX model, which may be fixed by replacing: +You may also see an error when converting a PyTorch model to ONNX model, which may be fixed by replacing: `torch.onnx.export(resnet50, dummy_input, "resnet50_pytorch.onnx", verbose=False)` @@ -155,12 +160,13 @@ with: `torch.onnx.export(model, dummy_input, "deeplabv3_pytorch.onnx", opset_version=11, verbose=False)` -### Using PyTorch +### Using PyTorch + First, to download and install PyTorch 1.9 on Nano, run the following commands (see [here](https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-10-now-available/72048) for more information): ``` wget https://nvidia.box.com/shared/static/p57jwntv436lfrd78inwl7iml6p13fzh.whl -O torch-1.8.0-cp36-cp36m-linux_aarch64.whl -O torch-1.9.0-cp36-cp36m-linux_aarch64.whl -sudo apt-get install python3-pip libopenblas-base libopenmpi-dev +sudo apt-get install python3-pip libopenblas-base libopenmpi-dev pip3 install Cython pip3 install numpy torch-1.9.0-cp36-cp36m-linux_aarch64.whl ``` @@ -173,17 +179,18 @@ pip3 install torchvision-0.10.0a0+300a8a4-cp36-cp36m-linux_aarch64.whl ``` After the steps above, run this to confirm: + ``` $ pip3 list|grep torch torch (1.9.0) torchvision (0.10.0) ``` -You can also use the docker image described in the section *Using Jetson Inference* (which also has PyTorch and torchvision installed), to skip the manual steps above. +You can also use the docker image described in the section _Using Jetson Inference_ (which also has PyTorch and torchvision installed), to skip the manual steps above. The official [YOLOv5](https://github.com/ultralytics/yolov5) repo is used to run the PyTorch YOLOv5 model on Jetson Nano. After logging in to Jetson Nano, follow the steps below: -* Get the repo and install what’s required: +- Get the repo and install what’s required: ``` git clone https://github.com/ultralytics/yolov5 @@ -191,13 +198,13 @@ cd yolov5 pip install -r requirements.txt ``` -* Run `python3 detect.py`, which by default uses the PyTorch yolov5s.pt model. You should see something like: +- Run `python3 detect.py`, which by default uses the PyTorch yolov5s.pt model. You should see something like: ``` detect: weights=yolov5s.pt, source=data/images, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False YOLOv5 🚀 v5.0-499-g48b00db torch 1.9.0 CUDA:0 (NVIDIA Tegra X1, 3956.1015625MB) -Fusing layers... +Fusing layers... Model Summary: 224 layers, 7266973 parameters, 0 gradients image 1/5 /home/jeff/repos/yolov5-new/yolov5/data/images/bus.jpg: 640x480 4 persons, 1 bus, 1 fire hydrant, Done. (0.142s) ... @@ -206,6 +213,7 @@ image 1/5 /home/jeff/repos/yolov5-new/yolov5/data/images/bus.jpg: 640x480 4 pers **The inference time on Jetson Nano GPU is about 140ms, more than twice as fast as the inference time on iOS or Android (about 330ms).** If you get an error `“ImportError: The _imagingft C module is not installed.”` then you need to reinstall pillow: + ``` sudo apt-get install libpng-dev sudo apt-get install libfreetype6-dev @@ -231,29 +239,30 @@ Using the same test files used in the PyTorch iOS YOLOv5 demo app or Android YOL PyTorch YOLOv5 on Jetson Nano, example with a dog PyTorch YOLOv5 on Jetson Nano, example with a horse and a rider
-Figure 1. PyTorch YOLOv5 on Jetson Nano. +Figure 1. PyTorch YOLOv5 on Jetson Nano.
PyTorch YOLOv5 on iOS, example with a dog PyTorch YOLOv5 on iOS, example with a horse and a rider
-Figure 2. PyTorch YOLOv5 on iOS. +Figure 2. PyTorch YOLOv5 on iOS.
PyTorch YOLOv5 on Android, example with a dog PyTorch YOLOv5 on Android, example with a horse and a rider
-Figure 3. PyTorch YOLOv5 on Android. +Figure 3. PyTorch YOLOv5 on Android. ### Summary + Based on our experience of running different PyTorch models for potential demo apps on Jetson Nano, we see that even Jetson Nano, a lower-end of the Jetson family of products, provides a powerful GPU and embedded system that can directly run some of the latest PyTorch models, pre-trained or transfer learned, efficiently. Building PyTorch demo apps on Jetson Nano can be similar to building PyTorch apps on Linux, but you can also choose to use TensorRT after converting the PyTorch models to the TensorRT engine file format. But if you just need to run some common computer vision models on Jetson Nano using NVIDIA’s Jetson Inference which supports image recognition, object detection, semantic segmentation, and pose estimation models, then this is the easiest way. - ### References + Torch-TensorRT, a compiler for PyTorch via TensorRT: [https://github.com/NVIDIA/Torch-TensorRT/](https://github.com/NVIDIA/Torch-TensorRT/) @@ -261,11 +270,11 @@ Jetson Inference docker image details: [https://github.com/dusty-nv/jetson-inference/blob/master/docs/aux-docker.md](https://github.com/dusty-nv/jetson-inference/blob/master/docs/aux-docker.md) A guide to using TensorRT on the NVIDIA Jetson Nano: -[https://docs.donkeycar.com/guide/robot_sbc/tensorrt_jetson_nano/](https://docs.donkeycar.com/guide/robot_sbc/tensorrt_jetson_nano/) +[https://docs.donkeycar.com/guide/robot_sbc/tensorrt_jetson_nano/](https://docs.donkeycar.com/guide/robot_sbc/tensorrt_jetson_nano/) including: -1. Use Jetson as a portable GPU device to run an NN chess engine model: -[https://medium.com/@ezchess/jetson-lc0-running-leela-chess-zero-on-nvidia-jetson-a-portable-gpu-device-a213afc9c018](https://medium.com/@ezchess/jetson-lc0-running-leela-chess-zero-on-nvidia-jetson-a-portable-gpu-device-a213afc9c018) +1. Use Jetson as a portable GPU device to run an NN chess engine model: + [https://medium.com/@ezchess/jetson-lc0-running-leela-chess-zero-on-nvidia-jetson-a-portable-gpu-device-a213afc9c018](https://medium.com/@ezchess/jetson-lc0-running-leela-chess-zero-on-nvidia-jetson-a-portable-gpu-device-a213afc9c018) 2. A MaskEraser app using PyTorch and torchvision, installed directly with pip: -[https://github.com/INTEC-ATI/MaskEraser#install-pytorch](https://github.com/INTEC-ATI/MaskEraser#install-pytorch) + [https://github.com/INTEC-ATI/MaskEraser#install-pytorch](https://github.com/INTEC-ATI/MaskEraser#install-pytorch) diff --git a/_posts/2022-5-12-ambient-clinical-intelligence-generating-medical-reports-with-pytorch.md b/_posts/2022-5-12-ambient-clinical-intelligence-generating-medical-reports-with-pytorch.md index f14c9e8b14e2..134fd18c4031 100644 --- a/_posts/2022-5-12-ambient-clinical-intelligence-generating-medical-reports-with-pytorch.md +++ b/_posts/2022-5-12-ambient-clinical-intelligence-generating-medical-reports-with-pytorch.md @@ -3,23 +3,27 @@ layout: blog_detail title: "Ambient Clinical Intelligence: Generating Medical Reports with PyTorch" author: Miguel Del-Agua, Principal Research Scientist, Nuance and Jeremy Jancsary, Senior Principal Research Scientist, Nuance featured-img: "" +tags: + - tag 2 + - tag3 + - tag4 --- ## Introduction Complete and accurate clinical documentation is an essential tool for tracking patient care. It allows for treatment plans to be shared among care teams to aid in continuity of care and ensures a transparent and effective process for reimbursement. -Physicians are responsible for documenting patient care. Traditional clinical documentation methods have resulted in a sub-par patient-provider experience, less time interacting with patients, and decreased work-life balance. A significant amount of physicians’ time is spent in front of the computer doing administrative tasks. As a result, patients are less satisfied with the overall experience, and physicians, who prepare for years studying medicine, cannot practice at the top of their license and are burned out. Every hour physicians provide direct clinical face time to patients results in nearly two additional hours spent on EHR and desk work within the clinic day. Outside office hours, physicians [spend another 1 to 2 hours of personal](https://www.acpjournals.org/doi/10.7326/m16-0961) time each night doing additional computer and other clerical work. +Physicians are responsible for documenting patient care. Traditional clinical documentation methods have resulted in a sub-par patient-provider experience, less time interacting with patients, and decreased work-life balance. A significant amount of physicians’ time is spent in front of the computer doing administrative tasks. As a result, patients are less satisfied with the overall experience, and physicians, who prepare for years studying medicine, cannot practice at the top of their license and are burned out. Every hour physicians provide direct clinical face time to patients results in nearly two additional hours spent on EHR and desk work within the clinic day. Outside office hours, physicians [spend another 1 to 2 hours of personal](https://www.acpjournals.org/doi/10.7326/m16-0961) time each night doing additional computer and other clerical work. -* [42% of all physicians reported having burnout. – Medscape](https://www.medscape.com/slideshow/2020-lifestyle-burnout-6012460) -* [The problem has grown worse due to the pandemic with 64% of U.S. physicians now reporting burnout. - AAFP](https://www.aafp.org/journals/fpm/blogs/inpractice/entry/covid_burnout_survey.html#:~:text=Physician%20burnout%20was%20already%20a,5%2C000%20%E2%80%94%20practice%20in%20the%20U.S.) -* ["Too many bureaucratic tasks e.g., charting and paperwork" is the leading contribution to burnout, increased computerization ranks 4th.](https://login.medscape.com/login/sso/getlogin?urlCache=aHR0cHM6Ly93d3cubWVkc2NhcGUuY29tL3NsaWRlc2hvdy8yMDIwLWxpZmVzdHlsZS1idXJub3V0LTYwMTI0NjA%3D&ac=401) - Medscape -* [75% of U.S. Consumers Wish Their Healthcare Experiences Were More Personalized,](https://www.businesswire.com/news/home/20200218005006/en/75-of-U.S.-Consumers-Wish-Their-Healthcare-Experiences-Were-More-Personalized-Redpoint-Global-Survey-Reveals)- Business Wire -* [61% of patients would visit their healthcare provider more often if the communication experience felt more personalized.](https://www.businesswire.com/news/home/20200218005006/en/75-of-U.S.-Consumers-Wish-Their-Healthcare-Experiences-Were-More-Personalized-Redpoint-Global-Survey-Reveals) – Business Wire +- [42% of all physicians reported having burnout. – Medscape](https://www.medscape.com/slideshow/2020-lifestyle-burnout-6012460) +- [The problem has grown worse due to the pandemic with 64% of U.S. physicians now reporting burnout. - AAFP](https://www.aafp.org/journals/fpm/blogs/inpractice/entry/covid_burnout_survey.html#:~:text=Physician%20burnout%20was%20already%20a,5%2C000%20%E2%80%94%20practice%20in%20the%20U.S.) +- ["Too many bureaucratic tasks e.g., charting and paperwork" is the leading contribution to burnout, increased computerization ranks 4th.](https://login.medscape.com/login/sso/getlogin?urlCache=aHR0cHM6Ly93d3cubWVkc2NhcGUuY29tL3NsaWRlc2hvdy8yMDIwLWxpZmVzdHlsZS1idXJub3V0LTYwMTI0NjA%3D&ac=401) - Medscape +- [75% of U.S. Consumers Wish Their Healthcare Experiences Were More Personalized,](https://www.businesswire.com/news/home/20200218005006/en/75-of-U.S.-Consumers-Wish-Their-Healthcare-Experiences-Were-More-Personalized-Redpoint-Global-Survey-Reveals)- Business Wire +- [61% of patients would visit their healthcare provider more often if the communication experience felt more personalized.](https://www.businesswire.com/news/home/20200218005006/en/75-of-U.S.-Consumers-Wish-Their-Healthcare-Experiences-Were-More-Personalized-Redpoint-Global-Survey-Reveals) – Business Wire Physician burnout is one of the primary causes for increased [medical errors](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6175626/), malpractice suits, turnover, and decreased access to care. Burnout leads to an increase in healthcare costs and a decrease in overall patient satisfaction. [Burnout costs the United States $4.6 billion a year.](https://www.nejm.org/doi/full/10.1056/NEJMp2003149) -What can we do to bring back trust, joy, and humanity to the delivery of healthcare? A significant portion of the administrative work consists of entering patient data into Electronic Health Records (EHRs) and creating clinical documentation. Clinical documentation is created from information already in the EHR as well as from the patient-provider encounter conversation. +What can we do to bring back trust, joy, and humanity to the delivery of healthcare? A significant portion of the administrative work consists of entering patient data into Electronic Health Records (EHRs) and creating clinical documentation. Clinical documentation is created from information already in the EHR as well as from the patient-provider encounter conversation. This article will showcase how the Nuance Dragon Ambient eXperience (DAX), an AI-powered, voice-enabled, ambient clinical intelligence solution, automatically documents patient encounters accurately and efficiently at the point of care and the technologies that enable it. @@ -39,14 +43,14 @@ Two main NLP components play a role in automating the creation of clinical docum We will focus on this second component, Automatic Text Summarization, which is a difficult task with many challenges: -* Its performance is tied to the ASR quality from multiple speakers (noisy input). -* The input is conversational in nature and contains layman's terms. -* Protected Health Information (PHI) regulations limit medical data access. -* The information for one output sentence is potentially spread across multiple conversation turns. -* There is no explicit sentence alignment between input and output. -* Various medical specialties, encounter types, and EHR systems constitute a broad and complex output space. -* Physicians have different styles of conducting encounters and have their preferences for medical reports; there is no standard. -* Standard summarization metrics might differ from human judgment of quality. +- Its performance is tied to the ASR quality from multiple speakers (noisy input). +- The input is conversational in nature and contains layman's terms. +- Protected Health Information (PHI) regulations limit medical data access. +- The information for one output sentence is potentially spread across multiple conversation turns. +- There is no explicit sentence alignment between input and output. +- Various medical specialties, encounter types, and EHR systems constitute a broad and complex output space. +- Physicians have different styles of conducting encounters and have their preferences for medical reports; there is no standard. +- Standard summarization metrics might differ from human judgment of quality.

@@ -68,18 +72,18 @@ Figure 3: Excerpt of an AI-generated medical report. HPI stands for History of p [PyTorch](https://pytorch.org/) is an open-source machine learning framework developed by Facebook that helps researchers prototype Deep Learning models. The [Fairseq](https://github.com/pytorch/fairseq) toolkit is built on top of PyTorch and focuses on sequence generation tasks, such as Neural Machine Translation (NMT) or Text Summarization. Fairseq features an active community that is continuously providing reference implementations of state-of-the-art models. It contains many built-in components (model architectures, modules, loss functions, and optimizers) and is easily extendable with plugins. -Text summarization constitutes a significant challenge in NLP. We need models capable of generating a short version of a document while retaining the key points and avoiding uninformative content. These challenges can be addressed with different approaches. 1). Abstractive text summarization aimed at training models that can generate a summary in narrative form. 2). Extractive methods where the models are trained to select the most important parts from the input text. 3). A combination of the two, where the essential parts from the input are selected and then summarized in an abstractive fashion. Hence, summarization can be accomplished via a single end-to-end network or as a pipeline of extractive and abstractive components. To that end, Fairseq provides all the necessary tools to be successful in our endeavor. It features either end-to-end models such as the classical Transformer, different types of Language Models and pre-trained versions that enable researchers to focus on what matters most—to build state-of-the-art models that generate valuable reports. +Text summarization constitutes a significant challenge in NLP. We need models capable of generating a short version of a document while retaining the key points and avoiding uninformative content. These challenges can be addressed with different approaches. 1). Abstractive text summarization aimed at training models that can generate a summary in narrative form. 2). Extractive methods where the models are trained to select the most important parts from the input text. 3). A combination of the two, where the essential parts from the input are selected and then summarized in an abstractive fashion. Hence, summarization can be accomplished via a single end-to-end network or as a pipeline of extractive and abstractive components. To that end, Fairseq provides all the necessary tools to be successful in our endeavor. It features either end-to-end models such as the classical Transformer, different types of Language Models and pre-trained versions that enable researchers to focus on what matters most—to build state-of-the-art models that generate valuable reports. However, we are not just summarizing the transcribed conversation; we generate high-quality medical reports, which have many considerations. -* Every section of a medical report is different in terms of content, structure, fluency, etc. -* All medical facts mentioned in the conversation should be present in the report, for example, a particular treatment or dosage. -* In the healthcare domain, the vocabulary is extensive, and models need to deal with medical terminology. -* Patient-doctor conversations are usually much longer than the final report. +- Every section of a medical report is different in terms of content, structure, fluency, etc. +- All medical facts mentioned in the conversation should be present in the report, for example, a particular treatment or dosage. +- In the healthcare domain, the vocabulary is extensive, and models need to deal with medical terminology. +- Patient-doctor conversations are usually much longer than the final report. All these challenges require our researchers to run a battery of extensive experiments. Thanks to the flexibility of PyTorch and Fairseq, their productivity has greatly increased. Further, the ecosystem offers an easy path from ideation, implementation, experimentation, and final roll-out to production. Using multiple GPUs or CPUs is as simple as providing an additional argument to the tools, and because of the tight Python integration, PyTorch code can be easily debugged. -In our continuous effort to contribute to the open-source community, features have been developed at Nuance and pushed to the Fairseq GitHub repository. These try to overcome some of the challenges mentioned such as, facilitating copying of, especially rare or unseen, words from the input to summary, training speedups by improving Tensor Core utilization, and ensuring TorchScript compatibility of different Transformer configurations. Following, we will show an example of how to train a Transformer model with a Pointer Generator mechanism (Transformer-PG), which can copy words from the input. +In our continuous effort to contribute to the open-source community, features have been developed at Nuance and pushed to the Fairseq GitHub repository. These try to overcome some of the challenges mentioned such as, facilitating copying of, especially rare or unseen, words from the input to summary, training speedups by improving Tensor Core utilization, and ensuring TorchScript compatibility of different Transformer configurations. Following, we will show an example of how to train a Transformer model with a Pointer Generator mechanism (Transformer-PG), which can copy words from the input. ## How to build a Transformer model with a Pointer Generator mechanism @@ -158,8 +162,8 @@ fairseq-preprocess --task "translation" \ --cpu \ --joined-dictionary \ --destdir -``` - +``` + You might notice the type of task is "translation". This is because there is no "summarization" task available; we could understand it as a kind of NMT task where the input and output languages are shared and the output (summary) is shorter than the input. ### 4. Now we can train the model: @@ -193,8 +197,8 @@ fairseq-train \ This configuration makes use of features Nuance has contributed back to Fairseq: -* Transformer with a Pointer Generator mechanism to facilitate copying of words from the input. -* Sequence length padded to a multiple of 8 to better use tensor cores and reduce training time. +- Transformer with a Pointer Generator mechanism to facilitate copying of words from the input. +- Sequence length padded to a multiple of 8 to better use tensor cores and reduce training time. ### 5. Now let's take a look at how to generate a summary with our new medical report generation system: diff --git a/_posts/2022-5-18-introducing-accelerated-pytorch-training-on-mac.md b/_posts/2022-5-18-introducing-accelerated-pytorch-training-on-mac.md index baf1a087d89c..7148f3ba1243 100644 --- a/_posts/2022-5-18-introducing-accelerated-pytorch-training-on-mac.md +++ b/_posts/2022-5-18-introducing-accelerated-pytorch-training-on-mac.md @@ -3,9 +3,13 @@ layout: blog_detail title: "Introducing Accelerated PyTorch Training on Mac" author: PyTorch featured-img: "/assets/images/METAPT-002-BarGraph-02-static.png" +tags: + - tag1 + - tag 2 + - tag3 --- -In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac. Until now, PyTorch training on Mac only leveraged the CPU, but with the upcoming PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training. This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac. +In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac. Until now, PyTorch training on Mac only leveraged the CPU, but with the upcoming PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training. This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac.

@@ -13,11 +17,11 @@ In collaboration with the Metal engineering team at Apple, we are excited to ann ## Metal Acceleration -Accelerated GPU training is enabled using Apple’s Metal Performance Shaders (MPS) as a backend for PyTorch. The MPS backend extends the PyTorch framework, providing scripts and capabilities to set up and run operations on Mac. MPS optimizes compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family. The new device maps machine learning computational graphs and primitives on the MPS Graph framework and tuned kernels provided by MPS. +Accelerated GPU training is enabled using Apple’s Metal Performance Shaders (MPS) as a backend for PyTorch. The MPS backend extends the PyTorch framework, providing scripts and capabilities to set up and run operations on Mac. MPS optimizes compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family. The new device maps machine learning computational graphs and primitives on the MPS Graph framework and tuned kernels provided by MPS. ## Training Benefits on Apple Silicon -Every Apple silicon Mac has a unified memory architecture, providing the GPU with direct access to the full memory store. This makes Mac a great platform for machine learning, enabling users to train larger networks or batch sizes locally. This reduces costs associated with cloud-based development or the need for additional local GPUs. The Unified Memory architecture also reduces data retrieval latency, improving end-to-end performance. +Every Apple silicon Mac has a unified memory architecture, providing the GPU with direct access to the full memory store. This makes Mac a great platform for machine learning, enabling users to train larger networks or batch sizes locally. This reduces costs associated with cloud-based development or the need for additional local GPUs. The Unified Memory architecture also reduces data retrieval latency, improving end-to-end performance. In the graphs below, you can see the performance speedup from accelerated GPU training and evaluation compared to the CPU baseline: @@ -29,11 +33,10 @@ In the graphs below, you can see the performance speedup from accelerated GPU tr Accelerated GPU training and evaluation speedups over CPU-only (times faster)

- ## Getting Started To get started, just install the latest [Preview (Nightly) build](https://pytorch.org/get-started/locally/) on your Apple silicon Mac running macOS 12.3 or later with a native version (arm64) of Python. - + You can also learn more about Metal and MPS on [Apple’s Metal page](https://developer.apple.com/metal/). \* _Testing conducted by Apple in April 2022 using production Mac Studio systems with Apple M1 Ultra, 20-core CPU, 64-core GPU 128GB of RAM, and 2TB SSD. Tested with macOS Monterey 12.3, prerelease PyTorch 1.12, ResNet50 (batch size=128), HuggingFace BERT (batch size=64), and VGG16 (batch size=64). Performance tests are conducted using specific computer systems and reflect the approximate performance of Mac Studio._