|
1 | 1 | # Preprocessing Open-I Dataset
|
2 | 2 |
|
3 |
| -The Open-I dataset provides a collection of 3,996 radiology reports |
4 |
| -with 8,121 associated images in PA, AP and lateral views. In this tutorial, we utilize the images from fronal view with their corresponding reports for training and |
5 |
| -evaluation of the TransChex model. The chest x-ray images and reports are originally from the Indiana University hospital (see the licencing information below). |
6 |
| -The 14 finding categories in this work include Atelectasis, Cardiomegaly, Consolidation, Edema, Enlarged-Cardiomediastinum, Fracture, Lung-Lesion, Lung-Opacity, No-Finding, Pleural-Effusion, Pleural-Other, Pneumonia, Pneumothorax and Support-Devices. More information can be found in the following link: |
| 3 | +The Open-I dataset provides a collection of 3,996 radiology reports |
| 4 | +with 8,121 associated images in PA, AP and lateral views. In this tutorial, we utilize the images from fronal view with their corresponding reports for training and |
| 5 | +evaluation of the TransChex model. The chest x-ray images and reports are originally from the Indiana University hospital (see the licencing information below). |
| 6 | +The 14 finding categories in this work include Atelectasis, Cardiomegaly, Consolidation, Edema, Enlarged-Cardiomediastinum, Fracture, Lung-Lesion, Lung-Opacity, No-Finding, Pleural-Effusion, Pleural-Other, Pneumonia, Pneumothorax and Support-Devices. More information can be found in the following link: |
7 | 7 | https://openi.nlm.nih.gov/faq
|
8 | 8 |
|
9 | 9 | License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
|
10 | 10 |
|
11 | 11 | In this section, we provide the steps that are needed for preprocessing the Open-I dataset for
|
12 | 12 | the multi-label disease classification tutorial using TransCheX model. As a result, once the following steps are
|
13 |
| -completed, the dataset can be readily used for the tutorial. |
| 13 | +completed, the dataset can be readily used for the tutorial. |
14 | 14 |
|
15 | 15 | ### Preprocessing Steps
|
16 |
| -1) Create a new folder named 'monai_data' for downloading the raw data and preprocessing. |
| 16 | +1) Create a new folder named 'monai_data' for downloading the raw data and preprocessing. |
17 | 17 | 2) Download the chest X-ray images in PNG format from this [link](https://openi.nlm.nih.gov/imgs/collections/NLMCXR_png.tgz). Copy the downloaded file (NLMCXR_png.tgz)
|
18 |
| -to 'monai_data' directory and extract it. |
| 18 | +to 'monai_data' directory and extract it. |
19 | 19 | 3) Download the reports in XML format from this [link](https://openi.nlm.nih.gov/imgs/collections/NLMCXR_reports.tgz). Copy the downloaded file (NLMCXR_reports.tgz)
|
20 | 20 | to 'monai_data' directory and extract it.
|
21 | 21 | 4) Download the splits of train, validation and test datasets from this [link](https://drive.google.com/u/1/uc?id=1_CThgwbDQPeTrr2Gvi6zflqr32_5t87j&export=download). Copy the downloaded file (TransChex_openi.zip)
|
22 | 22 | to 'monai_data' directory and extract it.
|
23 |
| -5) Run 'preprocess_openi.py' to process the images and reports. |
24 |
| - |
| 23 | +5) Run 'preprocess_openi.py' to process the images and reports. |
0 commit comments