Skip to content

TST/BUG: Rename html encoding test files. #7903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 2, 2014

Conversation

jmorris0x0
Copy link
Contributor

html test_encode fails on OSX 10.9.4 due to missing dash in utf32 encoding
string.

The test encoding string is derived from a split of the test file name.

Only utf32 is affected but this commit changes the utf8 and utf16 containing filenames as well for the sake of consistency.

ERROR: test_encode (pandas.io.tests.test_html.TestReadHtmlEncodingLxml)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jmorris/Code/pandas/pandas/io/tests/test_html.py", line 624, in test_encode
    from_string = self.read_string(f, encoding).pop()
  File "/Users/jmorris/Code/pandas/pandas/io/tests/test_html.py", line 619, in read_string
    return self.read_html(fobj.read(), encoding=encoding, index_col=0)
  File "/Users/jmorris/Code/pandas/pandas/io/tests/test_html.py", line 607, in read_html
    return read_html(*args, **kwargs)
  File "/Users/jmorris/Code/pandas/pandas/io/html.py", line 851, in read_html
    parse_dates, tupleize_cols, thousands, attrs, encoding)
  File "/Users/jmorris/Code/pandas/pandas/io/html.py", line 714, in _parse
    raise_with_traceback(retained)
  File "/Users/jmorris/Code/pandas/pandas/io/html.py", line 708, in _parse
    tables = p.parse_tables()
  File "/Users/jmorris/Code/pandas/pandas/io/html.py", line 178, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "/Users/jmorris/Code/pandas/pandas/io/html.py", line 527, in _build_doc
    parser = HTMLParser(recover=False, encoding=self.encoding)
  File "/Users/jmorris/anaconda/envs/py27/lib/python2.7/site-packages/lxml/html/__init__.py", line 1664, in __init__
    super(HTMLParser, self).__init__(**kwargs)
  File "parser.pxi", line 1598, in lxml.etree.HTMLParser.__init__ (src/lxml/lxml.etree.c:100669)
  File "parser.pxi", line 792, in lxml.etree._BaseParser.__init__ (src/lxml/lxml.etree.c:93393)
LookupError: unknown encoding: 'utf32'

Test failure was not resolved by building the most recent lxml with static dependencies and most recent versions of libxml2 and libxslt.

I contacted the lxml mailing list.

http://mailman-mail5.webfaction.com/pipermail/lxml/2014-July/007239.html

It was suggested that problem may be in OSX libiconv though iconv -ldoesn't show the bug on my system.

@jmorris0x0
Copy link
Contributor Author

Funny, my branch builds fine. Is friday night a bad time for FRED?

@jreback
Copy link
Contributor

jreback commented Aug 2, 2014

hah

rebase on master

I had fixed this issue already

html test_encode fails on OSX due to missing dash in ‘utf32’ encoding
string.
@jmorris0x0
Copy link
Contributor Author

done.

@jreback jreback added this to the 0.15.0 milestone Aug 2, 2014
@jreback
Copy link
Contributor

jreback commented Aug 2, 2014

@cpcloud looks ok to me

@cpcloud
Copy link
Member

cpcloud commented Aug 2, 2014

weird issue! @jmorris0x0 thanks for fixing!

cpcloud added a commit that referenced this pull request Aug 2, 2014
TST/BUG: Rename html encoding test files.
@cpcloud cpcloud merged commit 0d3229d into pandas-dev:master Aug 2, 2014
@jmorris0x0 jmorris0x0 deleted the osx-html-encode-bugfix branch August 2, 2014 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants