You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Fixing some inconsistencies in index parameters that are causing a discrepancy between Usearch and Hnswlib performance:
- Correctly specifying connectivity for hnswlib as num_neighbors_per_vertex instead of max_neighbors_per_vertex.
- Passing the ef option into hnswlib configuration.
Adding internal statistics introspection to Usearch and Hnswlib index wrappers.
PR for hnswlib changes: nmslib/hnswlib#594.
PR for usearch changes: unum-cloud/usearch#508
Also allow specifying multiple values of k to pass in as input, as long as they are not greater than the precomputed ground truth result list size.
Updating hnsw_tool to always convert uint8_t coordinates to float32 when using Hnswlib to have a fair comparison with Usearch on the SIFT1B dataset. Usearch does not currently support the uint8_t type natively.
The changes to src/inline-thirdparty will be pushed as separate commits generated by `build-support/thirdparty_tool --sync-inline-thirdparty`.
Test Plan:
Jenkins
Manual testing using hnsw_tool
- hnswlib: https://gist.githubusercontent.com/mbautin/d21580dcac0b51ad2d7bc9fc130c5f9e/raw
```
Hnswlib index with 5 levels
max_elements: 1000000
M: 16
maxM: 16
maxM0: 32
ef_construction: 128
ef: 10
mult: 0.360674
Level 0: 1000000 nodes, 21613828 edges, 21.61 average edges per node
Level 1: 62323 nodes, 885027 edges, 14.20 average edges per node
Level 2: 3855 nodes, 50515 edges, 13.10 average edges per node
Level 3: 238 nodes, 2543 edges, 10.68 average edges per node
Level 4: 17 nodes, 244 edges, 14.35 average edges per node
Totals: 1066433 nodes, 22552157 edges, 21.15 average edges per node
i-recall @ 50, i=1..10:
1-recall @ 50: 0.9695000052
2-recall @ 50: 0.9645000100
3-recall @ 50: 0.9604333043
4-recall @ 50: 0.9568499923
5-recall @ 50: 0.9541400075
6-recall @ 50: 0.9504333138
7-recall @ 50: 0.9467428327
8-recall @ 50: 0.9435999990
9-recall @ 50: 0.9406333566
10-recall @ 50: 0.9377999902
```
- usearch: https://gist.githubusercontent.com/mbautin/74948b310780562e74831eb29e43cb13/raw
```
Usearch index with 4 levels
connectivity: 16
connectivity_base: 32
expansion_add: 128
expansion_search: 10
inverse_log_connectivity: 0.360674
Level 0: 1000000 nodes, 20973352 edges, 20.97 average edges per node
Level 1: 64036 nodes, 890428 edges, 13.91 average edges per node
Level 2: 5090 nodes, 66295 edges, 13.02 average edges per node
Level 3: 481 nodes, 5304 edges, 11.03 average edges per node
Totals: 1069607 nodes, 21935379 edges, 20.51 average edges per node
i-recall@50, i=1..10:
1-recall @ 40: 0.9305999875
2-recall @ 40: 0.9201999903
3-recall @ 40: 0.9141333103
4-recall @ 40: 0.9085000157
5-recall @ 40: 0.9036399722
6-recall @ 40: 0.8987166882
7-recall @ 40: 0.8932142854
8-recall @ 40: 0.8890249729
9-recall @ 40: 0.8852999806
10-recall @ 40: 0.8813199997
```
Reviewers: sergei, aleksandr.ponomarenko
Reviewed By: sergei, aleksandr.ponomarenko
Subscribers: ybase
Differential Revision: https://phorge.dev.yugabyte.com/D38977
0 commit comments