Skip to content

hash: Support custom algo parameters #6400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

weltling
Copy link
Contributor

@weltling weltling commented Nov 4, 2020

The concrete need on this change is to support passing an initial seed
to the murmur hash. Passing a custom seed is important in terms of
randomizing the hash function.

The suggested implementation adds a HashTable parameter to all the
init callbacks. Further on, an array with custom arguments is accepted
from hash or hash_init from the user land. Currently several things
like hash_hkdf are not touched, as they don't need passing custom
args.

Some convenience macros have been added to the SHA/MD families of
functions, so the consuming code doesn't have to be changed widely.

Special note on what has happened to
ext/hash/tests/hash_serialize_003.phpt - the test gathers
serialization strings produced on different platforms. As the
implementation requires adding a new property to the hash context
object, it changes the serialization string. However, it doesn't seem
practical to repeat all the serialization strings again. The test is
changed so it tests that the string can be unserialized on the same
platform it was produced on at the runtime.

Another way to implement this is to add another type of the init that
would accept a HT with arguments. However, that would still require
touching all the context structs in all the algos. That would also
increase the size of those structs. As an init function is called just
once, the way of modifying the existing init callback has been seen
preferrable.

Signed-off-by: Anatol Belski ab@php.net

@weltling
Copy link
Contributor Author

weltling commented Nov 4, 2020

@m6w6 @nikic ^

@weltling
Copy link
Contributor Author

Poke on this one.

Thanks.

@weltling
Copy link
Contributor Author

@m6w6 ping :)

Copy link
Contributor

@m6w6 m6w6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While all those FooInitArgs shenanigans provide API compatibility they do not so for the ABI, but I think this is how we always did for X.y releases?

I wonder, though, we could have cut down the number of "provided" hash algos into half, it there was a "output bits" parameter for all those Foo<NBITS> algos..., couldn't we? Maybe something to deprecate and provision in the next major...

LGTM, all in all.

@weltling
Copy link
Contributor Author

Thanks for the reviews, i'll be working to address this.

Regarding reducing the number of hashes - yep, it would've allowed to have say SHA, and pass the 1/256/512/etc. Other arbitrary nice things can be done with the custom args mechanism. Like, fe, the byteorder of the final hash could be reversed, etc. And of course, the necessary parts like passing a starting seed, will be made now possible to do.

Thanks.

@weltling
Copy link
Contributor Author

I think this one is ready to be revaluated now.

Thansk.

@weltling weltling force-pushed the hash_custom_algo_args branch from 7f4f324 to 579fd5e Compare November 30, 2020 20:21
Copy link
Contributor

@kohler kohler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review is one-track :) It doesn't seem necessary to store the args in the hashcontext object; instead they should be used only in hash context initialization. I scanned through and see no reason to store args, even for cloning a hashcontext. There are benefits to concentrating all hash-algo-specific information in the hashcontext, and to keeping hashcontexts simpler.

@nikic
Copy link
Member

nikic commented Dec 1, 2020

This review is one-track :) It doesn't seem necessary to store the args in the hashcontext object; instead they should be used only in hash context initialization. I scanned through and see no reason to store args, even for cloning a hashcontext. There are benefits to concentrating all hash-algo-specific information in the hashcontext, and to keeping hashcontexts simpler.

Great point! I agree that leaving storage of arguments up to the hash implementation is better. I expect that in most cases no explicit storage will be needed, as is the case for murmur, where the extra args just affect how the state is initialized.

@m6w6
Copy link
Contributor

m6w6 commented Dec 1, 2020

🤦 of course, this makes a lot of sense!

@weltling
Copy link
Contributor Author

weltling commented Dec 1, 2020

OK, lets get it in by that. Easier for me as i don't have to touch that test for now. I'm still sceptic as it's inconsistent and once it's out in stable, that inconsistency is engraved on stone. Still it's master, so it'll be clarified better when more hashes with custom args flow in.

Thanks.

@weltling weltling force-pushed the hash_custom_algo_args branch from 579fd5e to e62e9f6 Compare December 1, 2020 19:25
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Only thing I would request is to rename $args -> $options in the public API, as that matches how we call such arguments elsewhere.

The concrete need on this change is to support passing an initial seed
to the murmur hash. Passing a custom seed is important in terms of
randomizing the hash function.

The suggested implementation adds a HashTable parameter to all the
init callbacks. Further on, an array with custom arguments is accepted
from `hash` or `hash_init` from the user land. Currently several things
like `hash_hkdf` are not touched, as they don't need passing custom
args.

Some convenience macros have been added to the SHA/MD families of
functions, so the consuming code doesn't have to be changed widely.

Special note on what has happened to
`ext/hash/tests/hash_serialize_003.phpt` - the test gathers
serialization strings produced on different platforms. As the
implementation requires adding a new property to the hash context
object, it changes the serialization string. However, it doesn't seem
practical to repeat all the serialization strings again. The test is
changed so it tests that the string can be unserialized on the same
platform it was produced on at the runtime.

Another way to implement this is to add another type of the init that
would accept a HT with arguments. However, that would still require
touching all the context structs in all the algos. That would also
increase the size of those structs. As an init function is called just
once, the way of modifying the existing init callback has been seen
preferrable.

Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
- Deref passed seed arg
- Don't dup args array

Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
@weltling weltling force-pushed the hash_custom_algo_args branch from de3c688 to 1271bff Compare December 5, 2020 19:20
As discussed, that's legacy and should not be touched.

Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
Signed-off-by: Anatol Belski <ab@php.net>
Co-authored-by: Nikita Popov <nikita.ppv@googlemail.com>
Co-authored-by: Nikita Popov <nikita.ppv@googlemail.com>
@php-pulls php-pulls closed this in 110b4e9 Dec 13, 2020
jrfnl added a commit to PHPCompatibility/PHPCompatibility that referenced this pull request Mar 9, 2022
…`hash*()` functions

> Hash
>
> The following functions `hash()`, `hash_file()`, and `hash_init()` now support an additional optional `options` argument, which can be used to pass algorithm specific data.

Includes unit tests.

Refs:
* https://www.php.net/manual/en/migration81.new-features.php#migration81.new-features.hash
* php/php-src#6400
* php/php-src@110b4e9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants