Skip to content

Fix pd.read_excel #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 1, 2022
Merged

Fix pd.read_excel #33

merged 5 commits into from
Jul 1, 2022

Conversation

hauntsaninja
Copy link
Contributor

No description provided.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jun 21, 2022

@hauntsaninja Can you add a test for this? See test_frame.py:test_read_csv() .

@Dr-Irv Dr-Irv mentioned this pull request Jun 29, 2022
@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Jul 1, 2022

Sorry, was away from laptop. I can add a test, but rather than effectively recreating a large part of pandas test suite, you may wish to consider using a tool like stubtest for basic consistency checks (#11 (comment))

@@ -1012,6 +1016,12 @@ def test_frame_getitem_isin() -> None:
assert_type(df[df.index.isin([1, 3, 5])], "pd.DataFrame")


def test_read_excel() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a pytest.skip() here, as this will fail with pytest, since foo doesn't exist.

Also add a comment referring to the Github issue, e.g. # GH 33

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 1, 2022

I've run stubtest and it uncovered a number of issues, so I'm going to look into addressing those, and adding stubtest to our testing suite.

@hauntsaninja
Copy link
Contributor Author

Thanks, fixed! stubtest has the option of using an allowlist, which can let you introduce it in CI today, and fix the backlog of errors over time. Let me know if you need help setting it up!

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 1, 2022

Thanks, fixed! stubtest has the option of using an allowlist, which can let you introduce it in CI today, and fix the backlog of errors over time. Let me know if you need help setting it up!

Thanks for letting me know and your offer of help. The challenge is that we don't know what to put on the allowlist.

I'm hoping to fix all the issues this weekend.

@Dr-Irv Dr-Irv merged commit 82107f6 into pandas-dev:main Jul 1, 2022
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 1, 2022

Thanks @hauntsaninja

@hauntsaninja hauntsaninja deleted the excel branch July 1, 2022 19:02
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 3, 2022

@hauntsaninja I tried stubtest, and can't get it to do what I'd like it to do. mypy requires stub files to be installed, but I want to use this in development. So in development, I have a local copy of pandas-stubs, and want to run stubtest so that it says "here is what is declared/imported in the installed version of pandas, but is missing from pandas-stubs"

If I install pandas-stubs, then I get complaints about third-party libraries that I don't care about.

Any suggestions on how to handle this use case would be welcome.

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Jul 6, 2022

Sorry for the slow reply, I'm on vacation, so only checking my laptop intermittently.

@Dr-Irv stubtest requires mypy to run correctly on the stubs before it actually performs its checks. This is what it means when it says "error: not checking stubs due to mypy build errors"

λ python -m mypy.stubtest pandas --concise
error: not checking stubs due to mypy build errors:
pandas-stubs/plotting/_matplotlib/converter.pyi:1: error: Skipping analyzing "matplotlib.dates": module is installed, but missing library stubs or py.typed marker
pandas-stubs/plotting/_matplotlib/converter.pyi:1: error: Skipping analyzing "matplotlib": module is installed, but missing library stubs or py.typed marker
pandas-stubs/plotting/_matplotlib/converter.pyi:2: error: Skipping analyzing "matplotlib.ticker": module is installed, but missing library stubs or py.typed marker
pandas-stubs/plotting/_matplotlib/converter.pyi:6: error: Skipping analyzing "matplotlib.units": module is installed, but missing library stubs or py.typed marker
pandas-stubs/plotting/_matplotlib/boxplot.pyi:3: error: Skipping analyzing "matplotlib.axes": module is installed, but missing library stubs or py.typed marker
pandas-stubs/plotting/_misc.pyi:11: error: Skipping analyzing "matplotlib.axes": module is installed, but missing library stubs or py.typed marker
pandas-stubs/plotting/_misc.pyi:12: error: Skipping analyzing "matplotlib.figure": module is installed, but missing library stubs or py.typed marker
pandas-stubs/plotting/_core.pyi:8: error: Skipping analyzing "matplotlib.axes": module is installed, but missing library stubs or py.typed marker
pandas-stubs/core/series.pyi:23: error: Skipping analyzing "matplotlib.axes": module is installed, but missing library stubs or py.typed marker
pandas-stubs/core/groupby/generic.pyi:18: error: Skipping analyzing "matplotlib.axes": module is installed, but missing library stubs or py.typed marker
pandas-stubs/core/frame.pyi:24: error: Skipping analyzing "matplotlib.axes": module is installed, but missing library stubs or py.typed marker
pandas-stubs/core/frame.pyi:24: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
pandas-stubs/_libs/tslibs/timestamps.pyi:157: error: Return type "Tuple[int, int, int]" of "isocalendar" incompatible with return type "_IsoCalendarDate" in supertype "datetime"
pandas-stubs/_libs/tslibs/timestamps.pyi:157: error: Return type "Tuple[int, int, int]" of "isocalendar" incompatible with return type "_IsoCalendarDate" in supertype "date"

So it looks to me like there are two issues with the stubs that need to be fixed before stubtest can actually run.

First, mypy needs to be able to resolve the reference to matplotlib. This is easiest done by having pandas-stubs declare a dependency on a package that provides stubs for matplotlib. I'm not actually aware of such a package — has the Pylance team published their matplotlib stubs on PyPI?

Second, it seems like there's some minor issue in timestamps.pyi. It's possible there's a real error there, but you could also add a type ignore for it for now.

Once these two issues are resolved, stubtest will be able to run more completely and will likely have a lot more to say.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jul 6, 2022

First, mypy needs to be able to resolve the reference to matplotlib. This is easiest done by having pandas-stubs declare a dependency on a package that provides stubs for matplotlib. I'm not actually aware of such a package — has the Pylance team published their matplotlib stubs on PyPI?

It's "published" on https://github.com/microsoft/python-type-stubs , but I know those stubs are incomplete. And mypy won't read stubs unless they are installed.

With mypy, we use ignore_missing_imports to avoid those errors.

Second, it seems like there's some minor issue in timestamps.pyi. It's possible there's a real error there, but you could also add a type ignore for it for now.

We use python_version = "3.8" to suppress those messages. That's because datetime.isocalendar() changed its return type from a tuple to a named tuple in python 3.9.

Once these two issues are resolved, stubtest will be able to run more completely and will likely have a lot more to say.

Would it be possible for stubtest to observe our mypy settings in its analysis?

@hauntsaninja
Copy link
Contributor Author

Got it, in that case you can use python -m mypy.stubtest pandas --concise --mypy-config-file pyproject.toml to make it work.

However, I don't really recommend reusing your mypy config like this, since it won't be reflective of what your users will be doing. That is, your users won't have matplotlib stubs unless you make them a dependency, so in practice users' type checkers will resolve all of those to Any.

@hauntsaninja
Copy link
Contributor Author

Example output, in case it's useful to you: https://paste.ee/p/KObbQ

@twoertwein
Copy link
Member

@hauntsaninja Does stubtest also work well when there are only very few pyi files? The pandas code itself has a few pyi files for cython code. It would be great to use stubtest to ensure that all public functions declared in pyi match the cython implementation.

@hauntsaninja
Copy link
Contributor Author

Does stubtest also work well when there are only very few pyi files

On mypy master, it should. There was a bug I fixed recently where stubtest would not check submodules that did not have corresponding pyi files. This was fixed in python/mypy#13030 (should be released soon as part of mypy 0.970).

In general, if you wish to avoid errors about things that are missing from the stub and only wish to check consistency with the things that are there, stubtest supports an --ignore-missing-stub argument.

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Jul 16, 2022

@twoertwein Looks like running stubtest against the main pandas repo finds a bunch of things, e.g. see pandas-dev/pandas#47756 , pandas-dev/pandas#47758 (there are more errors as well)

@twoertwein
Copy link
Member

@Dr-Irv I opened a PR to use stubtest on the pyi files within pandas pandas-dev/pandas#47817

Obviosuly, stubtest will report many errors when comparing pandas-stubs with pandas. But as @hauntsaninja pointed out, stubtest can generate an allow/ignore list. As a starting point, we could have an unfortunately enormously long ignore list to at least prevent adding new stubs that are not compatible with the implementation (or at least be aware of it and add an exception to the list). Over time we can then work on making this list smaller. I think it will never be empty but we can definitely make it smaller :) This will especially be helpful for the cleaning efforts in #149 and #128.

@twoertwein
Copy link
Member

Sorry for the stupid question @hauntsaninja How does stubtest need to be called on pandas or pandas-stubs:

  • python -m mypy.stubtest pandas-stubs --concise --ignore-missing-stub --mypy-config-file pyproject.toml throws many errors (for example that Series is not generic: it is generic in the stubs but not in the implementation)
  • python -m mypy.stubtest pandas --concise --mypy-config-file pyproject.toml cannot seem to find any stubs

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Aug 1, 2022

After running pip install -e . in my copy of pandas-stubs (not sure if strictly necessary), python -m mypy.stubtest pandas --concise --mypy-config-file pyproject.toml works for me.
Note that it complains about "failed to find stubs" for modules for which stubs do not actually exist, like pandas.util.version, but if you scroll up you'll see some more useful errors.

@twoertwein
Copy link
Member

Thanks, I think I forgot to install the stubs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants