Skip to content
This repository was archived by the owner on Nov 23, 2017. It is now read-only.
This repository was archived by the owner on Nov 23, 2017. It is now read-only.

Add an optional cache for loop.getaddrinfo() #161

Open
@GoogleCodeExporter

Description

@GoogleCodeExporter
Hi,

I tried the crawl.py example, and I noticed that it solves the host for each 
connection. For example, on my PC the script calls getaddrinfo() 160 times per 
second. It looks like each call sends a DNS request (a real UDP packet) to the 
DNS server. With DNSSEC enabled, it may even need to open a new TCP connection 
for each DNS resolution.

Would it make sense for write an optional cache for DNS resolution in 
BaseEventLoop? Or at least in crawl.py?

The common problem with cache is to configure it: number of cached results? 
timeout? The DNS protocol provides the timeout: the TTL field of a resource 
record (RR), which is a number of seconds. But getaddrinfo() API doesn't this 
value.

For example, Firefox caches 20 DNS results during 60 seconds by default.
http://kb.mozillazine.org/Network.dnsCacheExpiration
http://kb.mozillazine.org/Network.dnsCacheEntries

Info on DNS resolution in Chromium:
http://www.chromium.org/developers/design-documents/dns-prefetching

An old article (2011) says that Internet Explorer used a timeout of 24 hours, 
and it now uses a timeout of 30 minutes:
http://support.microsoft.com/kb/263558/en

See also the issue #160 (Asynchronous DNS client). It is not directly related 
because I don't see any option to cache results in these async DNS clients.

Original issue reported on code.google.com by victor.s...@gmail.com on 6 Mar 2014 at 5:01

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions