Skip to content

Simple query runs long when DB schema contains thousands of tables #186

Closed
@sergeyspatar

Description

@sergeyspatar
  • asyncpg version: 0.12
  • PostgreSQL version: 9.6.4
  • Python version: 3.5.3
  • Platform: Linux x86_64 (Debian Stretch)
  • Do you use pgbouncer?: no
  • Did you install asyncpg with pip?: no
  • If you built asyncpg locally, which version of Cython did you use?: 0.25.2
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : yes

My DB schema contains thousands of tables inherited from one table. These tables store some logs or historical information and have around 100 columns. As a result, PostgreSQL information schema table pg_attribute contains enormous number of records.

I use my own DB user sergey to connect to PostgreSQL who is a superuser. When I connect as this user, some simple queries run very long in asyncpg.

Preparation. This code creates 5000 inherited tables with 100 columns to prepare similar environment for the test cases below:

import asyncpg
import asyncio

loop = asyncio.get_event_loop()

async def run():
    conn = await asyncpg.connect(user='sergey')
    await conn.execute(
        'create table testlog ({})'.format(
            ','.join('c{:02} varchar'.format(n) for n in range(100))
        )
    )
    for n in range(5000):
        await conn.execute(
            'create table testlog_{:04} () inherits (testlog)'.format(n)
        )

loop.run_until_complete(run())

Example 1: Connect as sergey and run this simple query:

import asyncpg
import asyncio

#import uvloop
#asyncio.set_event_loop(uvloop.new_event_loop())

loop = asyncio.get_event_loop()

async def run():
    conn = await asyncpg.connect(user='sergey')
    return await conn.fetch('select $1::integer[]', [1, 2, 3])

print(loop.run_until_complete(run()))

While it was running for about 2 minutes, I saw this query in pg_activity_stat:

WITH RECURSIVE typeinfo_tree(
    oid, ns, name, kind, basetype, has_bin_io, elemtype, elemdelim,
    range_subtype, elem_has_bin_io, attrtypoids, attrnames, depth)
AS (
    WITH composite_attrs
    AS (
    SELECT
        c.reltype                    AS comptype_oid,
        array_agg(ia.atttypid ORDER BY ia.attnum)    AS typoids,
        array_agg(ia.attname::text ORDER BY ia.attnum)   AS names
    FROM
        pg_attribute ia
        INNER JOIN pg_class c
        ON (ia.attrelid = c.oid)
    WHERE
        ia.attnum > 0 AND NOT ia.attisdropped
    GROUP BY
        c.reltype
    ),

    typeinfo
    AS (
    SELECT
        t.oid               AS oid,
        ns.nspname              AS ns,
        t.typname               AS name,
        t.typtype               AS kind,
        (CASE WHEN t.typtype = 'd' THEN
        (WITH RECURSIVE typebases(oid, depth) AS (

Unfortunately, pg_activity_stat does not show the whole query but only first part. I guess the slow execution has to do with the size of pg_attribute.

Example 2: Now connect as user postgres and run the same query. It is still not instant but much faster (~1 second):

import asyncpg
import asyncio

#import uvloop
#asyncio.set_event_loop(uvloop.new_event_loop())

loop = asyncio.get_event_loop()

async def run():
    conn = await asyncpg.connect(user='postgres')
    return await conn.fetch('select $1::integer[]', [1, 2, 3])

print(loop.run_until_complete(run()))

Example 3: Not all queries execute long. This one is fast:

import asyncpg
import asyncio

#import uvloop
#asyncio.set_event_loop(uvloop.new_event_loop())

loop = asyncio.get_event_loop()

async def run():
    conn = await asyncpg.connect(user='sergey')
    return await conn.fetch('select $1::integer', 1)

print(loop.run_until_complete(run()))

P.S. I tried vacuum full analyze pg_attribute and vacuum full analyze pg_class, it did not help.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions