Skip to content

Differentiate between BINARY/CHAR #723

Closed
@kwoodhouse93

Description

@kwoodhouse93

MySQL provides BINARY and VARBINARY data types. These are treated as strings with the binary character set. As far as MySQL is concerned, the only real difference between binary and text fields is the character set (although this is a bit of a simplification).

This is why, when using ColumnTypes(), the mysql driver returns CHAR or VARCHAR; there's no notion of a BINARY field type internally.

Motivation

The use case I have is for an application that ingests data from multiple sources and writes it to a common format. This requires identifying column types in the input database (which is MySQL, in this case) to ensure they can be supported in the common format.

This common format supports UTF-8 strings, but not arbitrary binary strings. Therefore, I'd like to be able to identify whether a column that the driver says is CHAR is actually a text field, and not a byte slice.

Possible Solutions

I can see two possible approaches that would benefit my use case here:

  1. Provide access to a column's character set
  2. Switch over character set to return BINARY or VARBINARY when appropriate

Option 1: Provide access to a column's character set

The benefit of this option is that it opens the way for users of the driver to do more than just distinguish between BINARY and CHAR. One could potentially handle different character sets in different ways, which I can imagine could be useful with unusually-configured databases.

The major disadvantage is that I'm not aware of an existing interface in the stdlib 'sql' package for returning this sort of information. Doing so would require providing some new MySQL specific interface for getting this information out of the driver. Not impossible, but could be tricky to get right.

Option 2: Switch over character set to return BINARY or VARBINARY when appropriate

This option should be relatively straightforward to implement. I believe the packet received by the driver already contains information about the character set for string types. We could store this information when reading field information from a packet, and later switch over it when we need to know the column type.

The character set for binary fields is 63, as evidenced here: https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_basic_character_set.html.

This option would require some modest changes to fields.go and packets.go to provide the extra logic required. I believe this sort of approach was discussed in #667, with little opposition, but the decision was to wait for some feedback from users before implementing.

Conclusion

I suggest option 2 be implemented, as it requires the smallest change and solves the use case I presented above. Option 1 could still be implemented later if it was found to be useful.

Subject to any feedback on what I've proposed above, I'll try implementing option 2 with the aim of producing a suitable PR.

Let me know if you have any questions or comments or if you think I've missed anything.

Note, there is also a similar issue with the TEXT and BLOB data types (and their variants). The difference here is that these fields are read as BLOBs by default (whether they are actually TEXT or BLOB. The fix should be more or less the same in this case.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions