Skip to content

encoding not respected on read_msgpack #10581

Closed
@ruidc

Description

@ruidc

as discussed on https://groups.google.com/forum/#!topic/pydata/ngROaML_hLI
encoding does not seem to be respected on reading a msgpack, below i am expecting to get back what
I put in as utf8

In [17]: s
Out[17]: u'\u2019'

In [18]: s = pd.Series({'a' : u"\u2019" })

In [19]: s.values[0]
Out[19]: u'\u2019'

In [20]: pd.read_msgpack(s.to_msgpack(encoding='utf8')).values[0]
Out[20]: u'\xe2\x80\x99'

in stepping through, part of the problem seems to be that in the call to unpack on https://github.com/pydata/pandas/blob/master/pandas/io/packers.py#L134 that there is no encoding argument passed and so it defaults to latin1 in https://github.com/pydata/pandas/blob/master/pandas/io/packers.py#L558

changing L134 to :

l = list(unpack(fh, **kwargs))

and passing the encoding like:

pandas.read_msgpack(m, encoding='utf8') 

makes it work for me, however i don't have en environment set up to submit this as a pull request via GH, and we're still using 0.14.1 due to compatibility issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions