Skip to content

[FEA] A generic operator to automate inference execution with a MONAI Bundle  #286

Closed
@MMelQin

Description

@MMelQin

Is your feature request related to a problem? Please describe.
The MONAI Bundle includes the TorchScript model, network metadata, and the transform specifications. A generic operator needs to be developed in the App SDK to use a MONAI Bundle as input, and then execute the transforms and inference, without, or with very limited, additional application code.

Describe the solution you'd like
Detailed design is yet to be done, but these are the considerations:

  • Determine the best approach to feed the bundle to the operator at runtime
  • Parse and validate the bundle
  • Transform, for both pre and post, need to be instantiated based on spec
  • TorchScript model may already have been extracted out and packaged in the MONAI App Package (MAP), and loaded by the app execution engine at runtime. If not, load the model
  • Consider the scenario where the model is loaded on a remote server, e.g. Triton Inference Server, and the inference is to be performed using KServe API
  • Inference using the model script loaded in local process, or KServer API on remote inference server

Describe alternatives you've considered
Existing approach requires the deploy app developer collect the pre and post transforms, and code them up in MONAI Transform Compose to be passed to the App SDK built-in inference operator. The loading of model is carried out in the built-in inference operator either through the app execution engine or directly load from file.

Additional context
Ideally this functionality should be ready when the MONAI Bundle is formally released in MONAI 0.9. Triton support may need to be delayed till later release, though the design should consider it.

User Story
As a MONAI App developer, I have a trained model available in the form of a MONAI Bundle (MB) and need to have a base MONAI Deploy App SDK inference operator that can load the bundle, create pre and post transforms, and perform inference with the PyTorch model contained within the bundle, so that there is no or minimal coding required to use a model specific inference operator within my MONAI Deploy application, except for providing the required parameters, including, but not limited to, the input data to be transformed to as Tensor(s).

Requirements
The App SDK inference operator supporting MONAI Bundle (MB), called MB Operator for brevity in this doc, has the following requirements.

  • Each instance of the MB Operator shall support only one MB, which by definition contains only one PyTorch model per MB specification

  • The MB Operator shall parse the inference configuration in the MB, which is also expected to be embedded in the TorchScript file as extra/config.

    • The MB Operator shall instantiate the objects of MONAI Core to perform the specified pre-processing, inference, and post-processing steps

    • If the inference configuration, extra/config, is missing in the TorchScript model, it is an error condition and the MB Operator shall fail by raising exception

  • The MB Operator shall parse the metadata.json of the MB, which is also expected to be embedded in the TorchScript file as extra/metadata.json.

    • The MB Operator shall use the network format data in the metadata for inference execution
    • If the metadata, extra/metadata.json, is missing in the TorchScript model, it is an error condition and the MB Operator shall fail by raising exception
  • The MB Operator shall support associating inputs, image data or otherwise, to the named tensor inputs of the network in the MB

    • When there is a single input tensor, the single unnamed input to the MB Operator will be associated to input tensor
    • Each input to the MB Operator is assumed directly related to the corresponding input tensor, e.g. input image to the image tensor, and roi image to the roi tensor input if there is one
  • The MB Operator should support decorating the output data type, according to the model’s output, e.g. Image for segmentation image, and Text for classification output

    • When there is a single output from the MB Operator, its data type or name may not be decorated
  • The MB compliant TorchScript model file will be packaged in the MONAI Application Package. The TorchScript model file name will be the model name, unless a parent folder is used on packaging, in which case, the folder name becomes the model name

  • The App SDK application, with its model factory, shall load the MB compliant TorchScript model file the same way as regular TorchScript file, without specific processing of the extra component

    • It is optional to add the MB Compliant model type so that a loaded model can describe itself as being MB Compliant
  • When remote inference service is used, e.g. by using Triton Inference Server, the App SDK application shall still provide the model path, while not loading the model itself application process. This is a requirement for future releases, and is subject to change

Analysis
Consuming the MONAI Bundle (MB)
The TorchScript file in the MONAI Bundle contains metadata and configuration for inference processing, so it will be the only required data content from the bundle to be consumed at runtime.

  • MB inference operator needs the metadata and config items, and uses the MONAI utility to parse out pre and post processing configuration, as well as the model network details, e.g. Tensor names, data type, dims. Both pieces of data can be extracted from the MB compliant TorchScript file.
  • For in-process inference, the MB inference operator needs the PyTorch model loaded and accessible. This is achieved by loading the model file directly within the operator when testing standalone, but when running within an application, the App SDK model loading mechanism will be used, as explained next. ModelFactory and Model APIs.
  • The App SDK ModelFactory can locate the model file with an order set of strategies, e.g. command line argument for model path and environment variable. It then creates the App SDK object Model which encapsulates the loaded model/predictor as well as the path of the model file itself.
  • In the (future) case of remote inference, e.g. using a remote Triton Inference Server where the actual model is loaded, a runtime configuration is required to flag this mode, via an environment variable, so that the ModelFactory and the Model will still provide path to the TorchScript file itself, but without actually loading the model locally in the application process. Additionally, the Model will provide a new type of predictor capable of executing remorse inference calls.
  • The MB compliant TorchScript file, therefore, is invariant of the application run mode, avoiding any concerns about deployment configuration when the bundle is generated.

Packaging MONAI Bundle (MB) in MONAI Application Package (MAP)
Only the TorchScript file in the MONAI Bundle needs to be packaged in the MAP, the same way as with other non-MB model files, irrespective if (in the future release) the remote inference server will be used.

  • If the TorchScript file itself is directly used on packaging, the file name itself is model name, consistent with MB specification.
  • If the folder containing the TorchScript file is used on packaging, the folder name must match the MB model name, consistent with MB specification.

Consideration for Multiple Inputs and Outputs
Many models have a single input and a single output, e.g. a single volume image converted from a DICOM CT Abdomen series as input, and a segmentation image of liver as output. But advanced models may have multiple inputs, e.g. requiring both original and segmentation images from a CT Abdomen series.

Question then arises on how to describe/decorate the output and input of the MB inference operator with the need to support models with varying input(s) and output(s).

  • A base MB inference operator will support a single input and a single output, and both are unnamed, i.e. I/O not decorated on the operator
    • If the upstream operator has a single output, it can be connected to the downstream MB inference operator, without specifying the I/O mapping
    • If the upstream operator has more than one output, then one of its outputs must be identified in the I/O mapping with its decorated name with destination port being an blank string
    • The single unnamed input is then matched to the single input of the model, with type compatibility validation
  • For models with multiple inputs and outputs, a derived class of the base MB inference operator will be needed, for which the input and output must be decorated with name and type
  • It is further required that the derived operator’s input/output names match the Tensor names of the model
  • The above requirement can be relaxed with the base operator constructor supporting a mapping dictionary as optional argument to associate operator inputs/outputs to model Tensor names.

Consideration for Models Trained with Other Frameworks
If the model is not TorchScript, while the dev user still needs to use the config driven approach to automate the inference operator, it is reasonable to support passing in the metadata and inference configuration on the MB Inference operator, and to require the application to have metadata.json and inference config as static content.

However, there are details yet to be worked and this will be a future enhancement.

App Execution Mode During Dev and Deploy as MAP
The only difference will be passing the model path on the command line, vs passing the path via env var to the well defined FS path in the MAP which actually points to the MAP internal folder (this is due to the Packager not setting the env var to the default known folder, which can be addressed in this release)

MAP Definition
No change needed for the current MAP specification or structure.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions