Description
There appears to be a bug in the convert-vector-to-scf
where converting certain vector.transfer_read
ops causes an invalid memref.load
op to be created. Maybe the input IR that causes this is incompatible with the pass, but I think a proper error check should be done rather than let the implementation create an invalid op.
Example IR:
#map1 = affine_map<(d0, d1, d2, d3) -> (d0, 0, 0, d3)>
#map2 = affine_map<(d0, d1, d2, d3, d4) -> (d0, 0, 0, d3)>
func.func @main(%subview: memref<1x1x1x?xi32, strided<[17, 17, 17, 1], offset: ?>>, %c0 : index, %c0_i32 : i32, %2: vector<1x32xi1>) -> vector<1x1x1x32xi32> {
%3 = vector.transfer_read %subview[%c0, %c0, %c0, %c0], %c0_i32, %2 {in_bounds = [true, true, true, true], permutation_map = #map1} : memref<1x1x1x?xi32, strided<[17, 17, 17, 1], offset: ?>>, vector<1x1x1x32xi32>
return %3 : vector<1x1x1x32xi32>
}
Applying mlir-opt --convert-vector-to-scf
to this IR yields the following error
transfer-read.mlir:4:10: error: 'memref.load' op incorrect number of indices for load
%3 = vector.transfer_read %subview[%c0, %c0, %c0, %c0], %c0_i32, %2 {in_bounds = [true, true, true, true], permutation_map = #map1} : memref<1x1x1x?xi32, strided<[17, 17, 17, 1], offset: ?>>, vector<1x1x1x32xi32>
^
transfer-read.mlir:4:10: note: see current operation: %8 = "memref.load"(%4, %0, %0) <{nontemporal = false}> : (memref<1xvector<32xi1>>, index, index) -> vector<32xi1>
Note that the IR gets successfully transformed with the full-unroll
option turned on. Applying mlir-opt ---convert-vector-to-scf=full-unroll
yields
func.func @main(%arg0: memref<1x1x1x?xi32, strided<[17, 17, 17, 1], offset: ?>>, %arg1: index, %arg2: i32, %arg3: vector<1x32xi1>) -> vector<1x1x1x32xi32> {
%0 = vector.splat %arg2 : vector<1x1x1x32xi32>
%1 = vector.extract %arg3[0] : vector<1x32xi1>
%2 = vector.transfer_read %arg0[%arg1, %arg1, %arg1, %arg1], %arg2, %1 {in_bounds = [true]} : memref<1x1x1x?xi32, strided<[17, 17, 17, 1], offset: ?>>, vector<32xi32>
%3 = vector.insert %2, %0 [0, 0, 0] : vector<32xi32> into vector<1x1x1x32xi32>
return %3 : vector<1x1x1x32xi32>
}
This makes me believe that it should also work without the option, since in this case all the leading dims are size 1
and thus any loops that get generated would be trivial single iteration loops that would canonicalize away. Which, as far as I can tell, would be exactly what the unrolled version would generate.