Skip to content

ENH: Change Pandas User-Agent and add possibility to set custom http_headers to pd.read_* functions #36688

Closed
@astromatt

Description

@astromatt

Currently Pandas makes HTTP requests using "Python-urllib/3.8" as a User Agent.
This prevents from downloading some resources and static files from various places.
What if, Pandas would make requests using "Pandas/1.1.0" headers instead?
There should be possibility to add custom headers too (auth, csrf tokens, api versions and so on).

Use Case:

I am writing a book on Pandas:

I published data in CSV and JSON to use in code listings:

You can access those resources via browser, curl, or even requests, but not using Pandas.
The only change you'd need to do is to set User-Agent.
This is due to the readthedocs.io blocking "Python-urllib/3.8" User Agent for whatever reason.
The same problem affects many other places where you can get data (not only readthedocs.io).

Currently I get those resources with requests and then put response.text to one of:

  • pd.read_csv
  • pd.read_json
  • pd.read_html

Unfortunately this makes even simplest code listings... quite complex (due to the explanation of requests library and why I do this like that).

Pandas uses urllib.request.urlopen which does not allow to set http_headers
https://github.com/pandas-dev/pandas/blob/master/pandas/io/common.py#L146

Although urllib.request.urlopen can take urllib.request.Request as an argument.
And urllib.request.Request object has possibility to set custom http_headers
https://docs.python.org/3/library/urllib.request.html#urllib.request.Request

Possibility to add custom http_headers should be in pd.read_csv, pd.read_json and pd.read_html functions.

From what I see, the read_* call stack is three to four function deep.
There are only 6 references in 4 files to urlopen(*args, **kwargs) function.
So the change shouldn't be quite hard to implement.

http_headers parameter can be Optional[List] which will be fully backward compatible and would not require any changes to others code.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions