Skip to content

read_fwf with urlopen test GH#26376 #52233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 29, 2023
Merged

Conversation

liang3zy22
Copy link
Contributor

To avoid download file, I just compare the columns.


def test_url_urlopen():
url = "ftp://ftp.ncdc.noaa.gov/pub/data/igra/igra2-station-list.txt"
f = urlopen(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use this as a context manager?

@mroeschke mroeschke added IO Data IO issues that don't fit into a more specific label IO Network Local or Cloud (AWS, GCS, etc.) IO Issues labels Mar 27, 2023
@liang3zy22 liang3zy22 force-pushed the gh26376 branch 2 times, most recently from 6dc5937 to 6e4478e Compare March 28, 2023 07:53
@liang3zy22
Copy link
Contributor Author

If I added check_before_test=True, in the tm.network decorator then run the test case. Following error will be reported:

test_read_fwf.py::test_url_urlopen FAILED
pandas/tests/io/parser/test_read_fwf.py:1015 (test_url_urlopen)
args = (), kwargs = {}

    @wraps(t)
    def wrapper(*args, **kwargs):
        if (
            check_before_test
            and not raise_on_error
>           and not can_connect(url, error_classes)
        ):

../../../_testing/_io.py:222: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../_testing/_io.py:274: in can_connect
    if response.status != 200:
/Users/username/mambaforge/envs/pandas-dev/lib/python3.8/tempfile.py:468: in __getattr__
    a = getattr(file, name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <addclosehook at 5811361200 whose fp = <_io.BufferedReader name=-1>>
name = 'status'

    def __getattr__(self, name):
        # Attribute lookups are delegated to the underlying file
        # and cached for non-numeric results
        # (i.e. methods are cached, closed and friends are not)
        file = self.__dict__['file']
>       a = getattr(file, name)
E       AttributeError: '_io.BufferedReader' object has no attribute 'status'

/Users/username/mambaforge/envs/pandas-dev/lib/python3.8/tempfile.py:468: AttributeError

It seems like a bug in the tm codes?

def test_url_urlopen():
url = "ftp://ftp.ncdc.noaa.gov/pub/data/igra/igra2-station-list.txt"
with urlopen(url) as f:
expected = pd.Index(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this outside the context manager?


@pytest.mark.network
@tm.network(
url=("ftp://ftp.ncdc.noaa.gov/pub/data/igra/igra2-station-list.txt"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
url=("ftp://ftp.ncdc.noaa.gov/pub/data/igra/igra2-station-list.txt"),
url="ftp://ftp.ncdc.noaa.gov/pub/data/igra/igra2-station-list.txt",

@mroeschke
Copy link
Member

It seems like a bug in the tm codes?

Could you diagnose how to check if ftp urls are connectable?

Signed-off-by: Liang Yan <ckgppl_yan@sina.cn>
@liang3zy22
Copy link
Contributor Author

liang3zy22 commented Mar 29, 2023

I done some diagnosis. url open with ftp will return a urllib.response.addinfourl instance. There is a status method of addinfourl, but from python 3.9 only. And I have tried using python 3.10, the status only return None.

It seems that if the ftp url can be open, then it is also connectable. So I added codes in can_connect to only check http/https url status.

@mroeschke mroeschke added this to the 2.1 milestone Mar 29, 2023
@mroeschke mroeschke merged commit 7aab3b4 into pandas-dev:main Mar 29, 2023
@mroeschke
Copy link
Member

Thanks for the test and nice investigation @liang3zy22

@liang3zy22 liang3zy22 deleted the gh26376 branch March 29, 2023 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label IO Network Local or Cloud (AWS, GCS, etc.) IO Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.read_fwf fails with file pointer to url
2 participants