Skip to content

Use SipHash-1-3 for string and integer hashing #2149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

nikic
Copy link
Member

@nikic nikic commented Oct 1, 2016

Rebase of an old experiment for SipHash-2-4. Nowadays SipHash-1-3 is considered sufficient for hash tables, so we try that.

Micro-benchmark results for array operations x86-64 (first column with siphash, second column baseline):

Fixed string lookup:               16.2 mus +/- 0.1 mus   16.5 mus +/- 0.4 mus     -1.6%
Packed int insert:                 15.0 mus +/- 0.2 mus   15.0 mus +/- 0.1 mus     -0.2%
Reverse int insert:                38.6 mus +/- 0.6 mus   17.7 mus +/- 0.5 mus   +118.5%
Reverse int insert x16 collisions: 44.5 mus +/- 0.4 mus   44.4 mus +/- 0.9 mus     +0.2%
Incrementing string insert:        61.2 mus +/- 0.6 mus   43.3 mus +/- 0.9 mus    +41.4%
Packed int lookup:                 10.0 mus +/- 0.2 mus   9.9 mus +/- 0.1 mus     +0.5%
Reverse int lookup:                25.3 mus +/- 1.1 mus   11.7 mus +/- 0.2 mus   +115.4%
Incrementing string lookup:        45.3 mus +/- 1.5 mus   28.8 mus +/- 1.2 mus    +57.5%
Longer string lookup:              40.2 mus +/- 0.5 mus   29.3 mus +/- 0.9 mus    +37.2%

As such, we see about 120% regression for operations on non-packed integer hashes and about 50% on non-cached string hashes.

nikic added 5 commits October 1, 2016 23:15
Hashtable bucket keys are now represented using a union:

    union zend_bucket_key {
        zend_string *str;
        zend_ulong num;
    }

The bucket->h value is now a pure hash and no longer doubles as
being the integer key value.

The top bit of bucket->h (HT_IS_STR_BIT) determines whether the
key is a string or and integer. We can do this, because the top
bit will always be 1 after hashtable mask application, so its
value does not influence bucket distribution quality.

The zend_hash_integer() function is a bit of a misnomer -- introducing
integer hashing is not as simple as replacing this function.
For testing using a Murmur64 finalizer.
Using dummy key for testing.

After changing hash implementation run to avoid full rebuild:

touch Zend/zend_hash.c Zend/zend_string.c ext/oci8/oci8.c \
ext/opcache/zend_accelerator_hash.c sapi/phpdbg/phpdbg_bp.c \
sapi/phpdbg/phpdbg_cmd.c

Conflicts:
	Zend/zend_string.h
@hikari-no-yume
Copy link
Contributor

Oh, this is intriguing.

@smalyshev smalyshev added the RFC label Oct 30, 2016
@nikic
Copy link
Member Author

nikic commented Nov 16, 2016

Closing this for now as I don't consider it feasible in the current form. Would require at least some kind of adaptive hash switch for integer keys.

@nikic nikic closed this Nov 16, 2016
@hikari-no-yume
Copy link
Contributor

Hash switch?

@nikic
Copy link
Member Author

nikic commented Nov 16, 2016

@TazeTSchnitzel Dynamically switching from djb to siphash based on insert collision threshold, for example. (Other ideas on how to make this work efficiently welcome.)

@hikari-no-yume
Copy link
Contributor

Ah, because of the performance hit? That makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants