Description
Hi there!
Since version 12, PostgreSQL has supported a WHERE
clause within COPY ... FROM
statements:
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | PROGRAM 'command' | STDIN }
[ [ WITH ] ( option [, ...] ) ]
[ WHERE condition ]
It functions similarly to the WHERE
clause used within SELECT
statements, subject to some restrictions regarding subqueries.
Recently I've found myself in a situation where I am performing a bulk insert of CSV-formatted data from a remote machine, but would like to skip certain rows - such as those containing null values. From what I can tell, the existing Connection.copy_to_table
helper doesn't support this, and I'd need to drop down to private API (the underlying Connection._protocol._copy_in
machinery)
I'd appreciate some publicly exposed way of doing this - the simplest approach that comes to mind would be adding a where
kwarg to Connection.copy_to_table
as well as the Connection._format_copy_opts
method. Usage could then look something along the lines of
await conn.copy_to_table("cool_table", source="important_file.csv", where="important_value = 42")
I'm not sure if this is satisfactory, though. There would also need to be some consideration regarding earlier Postgres versions that don't support COPY ... FROM
using a WHERE
clause.
Please let me know your thoughts - I'm happy to open a PR for implementation if necessary.