Skip to content

Make type error messages more consistent #5092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

kocsismate
Copy link
Member

@kocsismate kocsismate commented Jan 17, 2020

This PR is the second part of #5066

Currently, type errors can be of many forms (see https://3v4l.org/IRXq6 for example code):

  • Internal functions with "standard" ZPP:
    • %s() expects parameter 1 to be int, array given
  • Internal functions with "custom" ZPP:
    • When only one parameter is given, it must be an array
    • Array expected as options when using translate or scale
    • Expects string, null, long or boolelan value for PostgreSQL '%s' (%s)
  • User-land functions:
    • Argument 1 passed to foo() must be of the type string, array given

I think it would make sense to settle on a single format - in the worst case, on a minimum number of similar formats. In the sense of this, my patch changes the error message structure of internal functions with "custom" ZPP to the one used by user-land functions, but I am open to any discussion about what kind of format to use.

However, I think we could (and should) go further, in which case these questions arise:

  • Should we formulate type error messages of internal and user-land functions the same way?
  • If so, which format to use?
  • Would this change need an RFC? (I guess so)

@kocsismate kocsismate force-pushed the consistent-error-messages2 branch 4 times, most recently from 7f843b2 to f9757e7 Compare January 18, 2020 11:09
@Girgias
Copy link
Member

Girgias commented Jan 18, 2020

Thanks for working on this, I don't think this would need an RFC as it's only what error message is being displayed.

I do think we probably should standardize user and internal error messages and go with the user formatted one because it makes more sense to me to indicate the Argument position.

Some of the custom ZPP ones may be a bit hard to standardize but most probably should be able to follow a similar pattern.

For ValueError and normal Error I think doing like TypeError and indicating the argument position seems like a sensible change.

@nikic
Copy link
Member

nikic commented Jan 18, 2020

It makes sense to standardize on one format, but it doesn't necessarily have to match any of the existing ones :) I don't like the userland one because it is currently quite special cased, partly for historic reasons. I'd prefer something that does must be of type ?Foo, Bar given over must be an instance of Foo or null, instance of Bar given. I think it might also make sense to make mention of the parameter name in addition to the argument number.

@kocsismate
Copy link
Member Author

kocsismate commented Jan 18, 2020

Some of the custom ZPP ones may be a bit hard to standardize but most probably should be able to follow a similar pattern.

Agree. Standardizing them on a best effort basis is perfect for me.

it doesn't necessarily have to match any of the existing

Agree. Prior to submitting the PR I did a small research, and collected type error messages from some of the most popular languages. Unfortunately, I didn't find any that I particularly liked, but I'll include them here just in case, maybe someone gets some inspiration from them :)

Java

HelloWorld.java:8: error: incompatible types: String cannot be converted to int
        foo("not good");

Go

./prog.go:10:12: cannot use "not good" (type string) as type int64 in argument to rand.Seed

Rust

error[E0308]: mismatched types
 --> src/main.rs:6:9
  |
6 |     foo("not good");
  |         ^^^^^^^^^^ expected i32, found reference
  |
  = note: expected type `i32`
             found type `&'static str`

I think it might also make sense to make mention of the parameter name in addition to the argument number.

I can also relate, but wasn't sure where to put it exactly, and whether it really helps in practice. I'll have a shot on doing so myself, but I'll probably need suggestions about the format.

+1: Nikita, you mentioned must be of type ?Foo, Bar given in your comment, however currently, there is a the before the type (must be of the type ?Foo, Bar given). Does it mean that you'd get rid of the article or you just accidentally left it out? For me, the sentence without the the feels a bit more natural, but I am not someone who is authoritative in English grammar. ^^

@nikic
Copy link
Member

nikic commented Jan 18, 2020

I can also relate, but wasn't sure where to put it exactly, and whether it really helps in practice. I'll have a shot on doing so myself, but I'll probably need suggestions about the format.

Yeah, I'm not really sure on this. There doesn't seem to be a really nice way to provide both the name and the number, and maybe it's not particularly useful in the first place.

+1: Nikita, you mentioned must be of type ?Foo, Bar given in your comment, however currently, there is a the before the type (must be of the type ?Foo, Bar given). Does it mean that you'd get rid of the article or you just accidentally left it out? For me, the sentence without the the feels a bit more natural, but I am not someone who is authoritative in English grammar. ^^

I'm using the "of type T" form for union type errors in

smart_str_appends(&str, "be of type ");
. Other messages use various type-specific variations, including "of the type T" for some cases... To me the "the" reads pretty weirdly, but I didn't want to touch it at the time.

@kocsismate kocsismate force-pushed the consistent-error-messages2 branch from f9757e7 to fc3ac80 Compare January 18, 2020 21:21
@kocsismate
Copy link
Member Author

kocsismate commented Jan 18, 2020

  • ZPP of internal functions uses the error message format of user-land functions from now on
  • The the article in question is removed, as well as the must be an instance of, and must implement interface expressions
  • Instead of or null, ? is used from now on
  • I left the must be a valid callback alone, I don't think it's ok to change
  • I adapted tests in Zend/ and tests/
  • I couldn't come up with a good enough way to also indicate the parameter name

Do you miss anything? I'll continue to fix the tests in ext/ if these changes are really ok. :)

@kocsismate kocsismate changed the title [WIP] Make type error messages more consistent Make type error messages more consistent Jan 18, 2020
@kocsismate
Copy link
Member Author

One more thing came to my mind.

Instead of the current message format (e.g. Argument 1 passed to get_class() must be of type object, string given), I could imagine using one of these:

  • Argument 1 passed to get_class() must be of object type, string given
  • Argument 1 passed to get_class() must be of type 'object', 'string' given

Would you think these versions make the message any more clear?

@Girgias
Copy link
Member

Girgias commented Jan 19, 2020

* `Argument 1 passed to get_class() must be of type 'object', 'string' given`

Is maybe a bit better but the first one is kinda confusing for me, especially if you have a class:
Argument 1 passed to get_class() must be of Foo\Bar type, string given just doesn't look right to me

@kocsismate
Copy link
Member Author

kocsismate commented Jan 19, 2020

Thanks, George, for the feedback. It's interesting for me to hear the opinion of native English speakers. ^^

Somehow I'd either expect some kind of quotation after the type word (like how Rust does in the note), or making the word order "straight" (e.g. object type). It's probably just me though... :) So I'm ok with the current format.

@Girgias
Copy link
Member

Girgias commented Jan 19, 2020

Inspiring it from Rust but we could maybe do something along the lines: Expected argument 1 passed to get_class() to be of type 'object', 'string' given Don't know if that's better or not

@nikic
Copy link
Member

nikic commented Jan 22, 2020

For reference, some typed properties error messages:

  • Cannot assign %s to property %s::$%s of type %s
  • Cannot increment property %s::$%s of type %s past its maximal value
  • Cannot assign %s to reference held by property %s::$%s of type %s

With this PR, the messages for arguments/returns are:

  • Argument %d passed to %s() must be of type %s, %s given
  • Return value of %s() must be of type %s, %s returned

The property errors put the given type first and expected type later. Two possibilities using this order:

  • %s(): Cannot pass %s to argument %d of type %s
  • Cannot return %s from %s() expecting type %s

We also have the option to go for the zpp style message rather than the userland one:

  • %s() expects argument %d of type %s, %s given
  • %s() expects argument %d to be of type %s, %s given

@kocsismate
Copy link
Member Author

kocsismate commented Jan 23, 2020

Thanks for the analysis, Nikita!

Do you guys have a strong preference? Do you think a quick poll on the ML or at least on Twitter would be useful?

Speaking about my preference, I think I still lean towards Argument %d passed to %s() must be of type %s, %s given or %s() expects argument %d to be of type %s, %s given , in this order (but I could also imagine putting the types inside apostrophes).

@kocsismate
Copy link
Member Author

kocsismate commented Jan 24, 2020

It's already quite visible that people on twiter prefer the ZPP-style message rather than the user-land one: https://twitter.com/kocsismate90/status/1220430708562448385?s=19

Plus, it turned out that PHPStan also displays the parameter name:

Parameter #1 $i of function foo expects int, string given.

@kocsismate kocsismate force-pushed the consistent-error-messages2 branch 3 times, most recently from ead7f04 to c0f12b6 Compare January 30, 2020 08:50
Copy link
Member Author

@kocsismate kocsismate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the wording according to the result of the vote.

@kocsismate kocsismate force-pushed the consistent-error-messages2 branch 6 times, most recently from 7404acf to c777019 Compare February 3, 2020 22:10
@kocsismate kocsismate force-pushed the consistent-error-messages2 branch from c777019 to aa79a6e Compare February 8, 2020 07:22
@kocsismate
Copy link
Member Author

@nikic Could you please have a look at this whenever you can allocate some time slot?

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... just to make sure we're really sure here (:P) a few more variations on the message:

%s() expects argument %d to be of type %s, %s given
%s() expects argument %d of type %s, %s given
%s() expects argument %d to be %s, %s given

It seems to be like the current "to be of type" is redundant and we can drop either the "to be" or the "of type".

Also, the Twitter thread has a suggestion for including the argument, which in this format would look something like this I guess?

%s() expects argument #%d ($%s) to be of type %s, %s given

@nikic
Copy link
Member

nikic commented Feb 13, 2020

@kocsismate I think the %s() expects argument #%d ($%s) to be of type %s, %s given variant is worth a try. Especially if we want to extend the same format to non-type related stuff:

explode() expects argument 1 to be non-empty
explode() expects argument #1 ($separator) to be non-empty

If we want to adopt this as a more general format, of course...

@kocsismate
Copy link
Member Author

So... just to make sure we're really sure here

Yeah, it's better to reiterate now :)

The "to be of type" version seems a bit verbose indeed, but I still like it because it makes clear that it's a type issue, not something else (in case of objects, it might not be obvious). Moreover, if we want to apply a similar format for other errors as well then IMO it would be advantageous to distinguish the different categories in order to avoid possible confusion.

Although I am not a great fan of including the parameter name, I could live with it. I can imagine that the parameter name is helpful for people. That said, I'll try to update the PR with this variant.

@kocsismate kocsismate force-pushed the consistent-error-messages2 branch from aa79a6e to 982cb28 Compare February 14, 2020 10:29
@kocsismate
Copy link
Member Author

@nikic I implemented the last variant. There will be still dozens of failures though... And there were some places (in the VM I think) from where I couldn't add the param name.

@kocsismate kocsismate force-pushed the consistent-error-messages2 branch 3 times, most recently from 6fde695 to fcba027 Compare February 15, 2020 09:28
@kocsismate kocsismate force-pushed the consistent-error-messages2 branch from fcba027 to 3391a4a Compare February 15, 2020 10:16
@kocsismate
Copy link
Member Author

I've implemented some suggested improvements, and managed to fix all the tests except for Zend/tests/call_user_func_009.phpt where the argument name doesn't match $arg in sort() expects argument #1 ($arg) to be a valid callback, ..."...

Additionally, there are 4 error messages ("%s() expects argument #1 to be a valid callback, %s") in zend_vm_execute.h and zend_vm_execute.h where I don't know how to add the param name in the message.

If it is possible, I'd prefer not to split this PR into a userland and an internal part, unless it's really needed (e.g. for making code review possible).

@nikic
Copy link
Member

nikic commented Feb 15, 2020

@kocsismate Those errors refer to the first arg of call_user_func/call_user_func_array, so you can insert whatever the parameter name is for those.

@kocsismate kocsismate force-pushed the consistent-error-messages2 branch from 3391a4a to fc82c65 Compare February 15, 2020 10:57
Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a bunch of places here checking for the arg name not be available. Is that case hit anywhere in practice?

Maybe with variadic arguments? Those are not included in the argument count, so get_func_arg_name() would return NULL for them as-is. I'm not quite sure what we should be printing for variadic args. I think your current userland code will print the name of the variadic arg, while the internal code will skip the arg name.

@kocsismate
Copy link
Member Author

kocsismate commented Feb 15, 2020

@nikic I've just wrote a test for variadic userland function arguments, and it really prints the name well. Unfortunately, I couldn't try it out with internal ones - as far as I saw custom ZPP checks are needed in case of a var arg.

However, the main reason that there's so many arg_name checks is that I didn't realize that the name is always part of the arg info (even though there is no stub defined). But besides this, I wanted to make sure I use the return value correctly even when zend_is_executing() is false or the supplied arg_num is more than num_args. Or I shouldn't bother about these checks?

Copy link
Member

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's land this is-as, we can clean up implementation details afterwards... Note that there is one failing test.

@nikic
Copy link
Member

nikic commented Feb 17, 2020

However, the main reason that there's so many arg_name checks is that I didn't realize that the name is always part of the arg info (even though there is no stub defined). But besides this, I wanted to make sure I use the return value correctly even when zend_is_executing() is false or the supplied arg_num is more than num_args. Or I shouldn't bother about these checks?

Right now it's still allowed to not have arg_info in internal functions, so the argument name isn't guaranteed. I plan to make this required soon though, and we can probably drop the checks at that point.

@kocsismate kocsismate force-pushed the consistent-error-messages2 branch from a33cf89 to 3cad30b Compare February 17, 2020 11:24
@kocsismate kocsismate force-pushed the consistent-error-messages2 branch from 3cad30b to 3077e6e Compare February 17, 2020 12:05
@php-pulls php-pulls closed this in ac0853e Feb 17, 2020
@kocsismate kocsismate deleted the consistent-error-messages2 branch February 17, 2020 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants