Skip to content

[Umbrella Ticket] Make ML path selection better: improve accuracy, speed up the inference and reduce the size of jar #703

Open
@amandelpie

Description

@amandelpie

This is an umbrella ticket for many sub-tasks

Description

The existing ML path selection is implemented in the utbot-analytics module
It suffers from a few problems:

  1. It uses external ML libraries for the model inference. It brings large size of jar
  2. It uses Smile library for inference (better to use scikit-learn and provide the model importer)
  3. Smile wrapper for blas is used for Matrix multiplication
  4. Kotlin implementation without external runtime is too slow (need our own native implementation of 1-3 operations like matrix mul) - probably multik could help
  5. The DJL inference is too slow
  6. The imported library in JSON/txt format
  7. We measure the metrics on the contest data
  8. The utbot-analytics module de-facto is not used.
  9. There a lot of ML-related settings mixed together with another settings in UtSettings

Expected behavior

  1. utbot-analytics module and its inheritors should be easily enabled/disabled from the intellij/cli modules
  2. Scripts for training should be structured and isolated
  3. Deployed ML models should be a part of jar
  4. No external libraries in the utbot-analytics module
  5. External settings should be extracted to the UtMLSettings
  6. Models are located in resources and packed with the plugin
  7. Models are not larger than 100 KB (zipped or saved in alternative binary format, not json or txt)
  8. utbot-analytics module contains only interfaces and pure Kotlin implementations
  9. utbot contains separate modules for model inference for the custom inference implementations (like DJL)
  10. Different path selectors could be easily compared and results could be displayed as a report
  11. The new metrics of path selection are created
  12. We reached better (significantly) numbers in metrics
  13. Obtained models are ranged and well described
  14. Training process and hyperparameter tuning is well described and published.

Related issues

Metadata

Metadata

Assignees

Labels

comp-summariesSomething related to the method names, code comments and display names generationctg-documentationImprovements or additions to documentationctg-enhancementNew feature, improvement or change request

Type

No type

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions