Skip to content

Tweak bounds checked indexing code (Index, IndexMut) #350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 7, 2017

Conversation

bluss
Copy link
Member

@bluss bluss commented Sep 7, 2017

Tweak bounds checked indexing code (Index, IndexMut)

Tweak bounds checked indexing code so that the compiler understands to
remove the bounds check in some simple cases. See examples in the new
example file bounds_check_elim.rs

This helps in some cases where a conditional or loop conditional
already do the required bounds check. Simple examples are if i < len and
while i < len loops; the range iterator loop is a worse example, it
doesn't do this as successfully! Further work is needed on that.

  • These improve: test1d_single, test1d_while, test1d_range, test2d_while, test2d_range
  • ..but still need more improvement: test1d_range, test2d_range

Example input code:

pub fn test1d_single(a: &Array1<f64>, i: usize) -> f64 {
    if i < a.len() { a[i] } else { 0. }
}

Codegen diff:

 test1d_single:
 	.cfi_startproc
-	push	rax
-.Lcfi0:
-	.cfi_def_cfa_offset 16
 	xorps	xmm0, xmm0
 	cmp	qword ptr [rdi + 32], rsi
-	jbe	.LBB0_3
+	jbe	.LBB0_2
+	mov	rax, qword ptr [rdi + 24]
 	imul	rsi, qword ptr [rdi + 40]
-	shl	rsi, 3
-	add	rsi, qword ptr [rdi + 24]
-	je	.LBB0_4
-	movsd	xmm0, qword ptr [rsi]
-.LBB0_3:
-	pop	rax
+	movsd	xmm0, qword ptr [rax + 8*rsi]
+.LBB0_2:
 	ret
-.LBB0_4:
-	call	_ZN7ndarray11arraytraits19array_out_of_bounds17h009d4a482e88d6aaE@PLT
 .Lfunc_end0:
 	.size	test1d_single, .Lfunc_end0-test1d_single
 	.cfi_endproc

Full codegen diff for the bounds check elim example: https://gist.github.com/bluss/e5d9b33a1eea36894534503b0e06341a/revisions?diff=split

Note that not all examples do so well.

@bluss
Copy link
Member Author

bluss commented Sep 7, 2017

Related to #349

@bluss
Copy link
Member Author

bluss commented Sep 7, 2017

It actually looks like this is another place where avoiding Option<&T> is good for performance. The compiler doesn't reason as well about that "optimized" option representation -- possibly because it conflates None with null.

  • Old code: Option<isize> mapped to an Option<&T>, then unwrapping.
  • New code: Option<isize>.unwrap() before making a &T

Tweak bounds checked indexing code so that the compiler understands to
remove the bounds check in some simple cases. See examples in the new
example file bounds_check_elim.rs

This helps in *some* cases where a conditional or loop conditional
already do the required bounds check. Simple examples are if i < len and
while i < len loops; the range iterator loop is a worse example, it
doesn't do this as well! Further works is needed on that.

Input code:

```rust
pub fn test1d_single(a: &Array1<f64>, i: usize) -> f64 {
    if i < a.len() { a[i] } else { 0. }
}
```

Codegen diff:

```diff
 test1d_single:
 	.cfi_startproc
-	push	rax
-.Lcfi0:
-	.cfi_def_cfa_offset 16
 	xorps	xmm0, xmm0
 	cmp	qword ptr [rdi + 32], rsi
-	jbe	.LBB0_3
+	jbe	.LBB0_2
+	mov	rax, qword ptr [rdi + 24]
 	imul	rsi, qword ptr [rdi + 40]
-	shl	rsi, 3
-	add	rsi, qword ptr [rdi + 24]
-	je	.LBB0_4
-	movsd	xmm0, qword ptr [rsi]
-.LBB0_3:
-	pop	rax
+	movsd	xmm0, qword ptr [rax + 8*rsi]
+.LBB0_2:
 	ret
-.LBB0_4:
-	call	_ZN7ndarray11arraytraits19array_out_of_bounds17h009d4a482e88d6aaE@PLT
 .Lfunc_end0:
 	.size	test1d_single, .Lfunc_end0-test1d_single
 	.cfi_endproc
```
@bluss bluss force-pushed the bounds-check-elim branch from c18adef to 325bcf3 Compare September 7, 2017 20:24
Test cow with an array that changes strides when it is unsharing itself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant