Skip to content

Perf improvements to collections::BitSet. #25230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 18, 2015
Merged

Conversation

rayglover
Copy link

Some modest running-time improvements to std::collections::BitSet on bit-sets of varying set-membership densities. This is work originally from here. (Benchmarks copied below)

std::collections::BitSet / alt_collections::BitSet

copy_dense         ... 3.08x
copy_sparse        ... 4.22x
count_dense        ... 11.01x
count_sparse       ... 8.11x
from_bytes         ... 1.47x
intersect_dense    ... 6.54x
intersect_sparse   ... 4.37x
union_dense        ... 5.53x
union_sparse       ... 5.60x

The exception is from_bytes, which I've left unaltered since the optimization is rather obscure.

Compiling with the cpu feature popcnt gave a further ~10% improvement on my machine, but this wasn't factored in to the benchmarks above.

Similar improvements could be made to BitVec, although that would probably require more substantial changes.

criticism welcome!

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see CONTRIBUTING.md for more information.

@killercup
Copy link
Member

This looks like a breaking change – some stable items have different signatures.

@Stebalien
Copy link
Contributor

FWIW, BitSet itself is unstable. Anyways, I don't see any changed signatures (except for one with a now explicit lifetime that used to be implicit).

@killercup
Copy link
Member

@Stebalien, yes you are right, only some methods are declared stable, but not the Struct or the module itself.

I meant stuff like this, by the way:

#[stable(feature = "rust1", since = "1.0.0")]
-pub struct Union<'a>(TwoBitPositions<'a>);
+pub struct Union<'a>(BlockIter<TwoBitPositions<'a>>);

@Stebalien
Copy link
Contributor

The signature is pub struct Union<'a>(_) (the contents are private).

@killercup
Copy link
Member

Yes. You are right again. Just ignore me, I'm not in a Rusty frame of mind :)

@@ -1451,8 +1451,8 @@ impl BitSet {
/// ```
#[inline]
#[stable(feature = "rust1", since = "1.0.0")]
pub fn iter(&self) -> bit_set::Iter {
SetIter {set: self, next_idx: 0}
pub fn iter<'a>(&'a self) -> bit_set::Iter<'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These annotations seem strictly redundant?

@Gankra
Copy link
Contributor

Gankra commented May 11, 2015

Great wins! Basically lgtm modulo minor style nits.

@alexcrichton
Copy link
Member

r? @gankro (moving reviewers)

@rust-highfive rust-highfive assigned Gankra and unassigned alexcrichton May 11, 2015
@rayglover
Copy link
Author

Thanks for the comments, I'll try to address them soon. I'm currently stuck in an airport 6000 miles from home so it will take a few days to get this up to snuff!

@Gankra
Copy link
Contributor

Gankra commented May 12, 2015

@rayglover Ok, no worries!

@rayglover
Copy link
Author

I've fixed the oddness in the size_hint, which was definitely incorrect! The other changes I believe address everything else that was commented on (docs, style etc.)

@Gankra
Copy link
Contributor

Gankra commented May 18, 2015

r? @alexcrichton (want decision on Octal/Hex is Debugish vs Displayish)

@rust-highfive rust-highfive assigned alexcrichton and unassigned Gankra May 18, 2015
@Gankra
Copy link
Contributor

Gankra commented May 18, 2015

r? @gankro

wrong PR

@rust-highfive rust-highfive assigned Gankra and unassigned alexcrichton May 18, 2015
@Gankra
Copy link
Contributor

Gankra commented May 18, 2015

@bors r+

Thanks!

@bors
Copy link
Collaborator

bors commented May 18, 2015

📌 Commit 307fab1 has been approved by Gankro

@bors
Copy link
Collaborator

bors commented May 18, 2015

⌛ Testing commit 307fab1 with merge 4daa62a...

bors added a commit that referenced this pull request May 18, 2015
Some modest running-time improvements to `std::collections::BitSet` on bit-sets of varying set-membership densities. This is work originally from [here](https://github.com/rayglover/alt_collections). (Benchmarks copied below)
```
std::collections::BitSet / alt_collections::BitSet

copy_dense         ... 3.08x
copy_sparse        ... 4.22x
count_dense        ... 11.01x
count_sparse       ... 8.11x
from_bytes         ... 1.47x
intersect_dense    ... 6.54x
intersect_sparse   ... 4.37x
union_dense        ... 5.53x
union_sparse       ... 5.60x
```

The exception is `from_bytes`, which I've left unaltered since the optimization is rather obscure.

Compiling with the cpu feature `popcnt` gave a further ~10% improvement on my machine, but this wasn't factored in to the benchmarks above.

Similar improvements could be made to `BitVec`, although that would probably require more substantial changes.

criticism welcome!
@bors
Copy link
Collaborator

bors commented May 18, 2015

@bors bors merged commit 307fab1 into rust-lang:master May 18, 2015
}

return None;
// from the current block, isolate the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more efficient to use http://doc.rust-lang.org/nightly/std/primitive.u32.html#method.trailing_zeros to get this index. (At least, I'd hope that LLVM lowers that instruction to the most efficient thing for the given platform.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw Ahh, you're probably right in the general case because the particular opcode (BSF) is more widely supported, whereas POPCNT needs to be explicitly enabled..

It would also remove the additional bit-twiddling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would assume that using the direct method (which actually calls an LLVM intrinsic) would always be at least as fast as this method: if it isn't, then it's an LLVM bug (since LLVM can implement the intrinsic via this method on platforms where this is the fastest).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why it would be an LLVM bug; llvm.cttz will generally result in a BSF instruction on most hardware. That's fine, except that BSF is comparatively very slow, several multiples slower than POPCNT (the instruction behind llvm.ctpop) on the same hardware.

However, BSF is more widely supported, and where POPCNT isn't available (or isn't enabled) then the resulting assembly is an additional ~12 instructions on 64-bit hardware (I don't know about 32-bit).

A good review of this algorithm (and the various ways of implementing it) can be found here. The fastest popcnt-based variant is consistently faster (~10-25%), although in the rust implementation the difference is less pronounced due to constant factors being more dominant.

As I said, std isn't normally compiled with the relevant cpu feature enabled for most of my points to be relevant, so it seems to make more sense to use your suggestion, which I've made here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.