This repository was archived by the owner on Nov 23, 2017. It is now read-only.
This repository was archived by the owner on Nov 23, 2017. It is now read-only.
Add an optional cache for loop.getaddrinfo() #161
Open
Description
Hi,
I tried the crawl.py example, and I noticed that it solves the host for each
connection. For example, on my PC the script calls getaddrinfo() 160 times per
second. It looks like each call sends a DNS request (a real UDP packet) to the
DNS server. With DNSSEC enabled, it may even need to open a new TCP connection
for each DNS resolution.
Would it make sense for write an optional cache for DNS resolution in
BaseEventLoop? Or at least in crawl.py?
The common problem with cache is to configure it: number of cached results?
timeout? The DNS protocol provides the timeout: the TTL field of a resource
record (RR), which is a number of seconds. But getaddrinfo() API doesn't this
value.
For example, Firefox caches 20 DNS results during 60 seconds by default.
http://kb.mozillazine.org/Network.dnsCacheExpiration
http://kb.mozillazine.org/Network.dnsCacheEntries
Info on DNS resolution in Chromium:
http://www.chromium.org/developers/design-documents/dns-prefetching
An old article (2011) says that Internet Explorer used a timeout of 24 hours,
and it now uses a timeout of 30 minutes:
http://support.microsoft.com/kb/263558/en
See also the issue #160 (Asynchronous DNS client). It is not directly related
because I don't see any option to cache results in these async DNS clients.
Original issue reported on code.google.com by victor.s...@gmail.com
on 6 Mar 2014 at 5:01