Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Join-diff (in-db) + new query builder #242

Merged
merged 35 commits into from
Oct 11, 2022
Merged

Join-diff (in-db) + new query builder #242

merged 35 commits into from
Oct 11, 2022

Conversation

erezsh
Copy link
Contributor

@erezsh erezsh commented Sep 28, 2022

Replacing #236

Big PR containing -

  • Join-diff (in-db) implementation of data-diff, with collection of extra statistics, working for all major databases
  • Query builder package, for easy construction of SQL queries with Python. While not ready yet for all use-cases, it's complete enough for the data-diff use-case.
  • Existing differ moved to "hashdiff_tables" module, and updated to use the new query builder
  • Updated the CLI and Python API to allow using both algorithms.
  • Schema diff when comparing in same db

New CLI interface works as follow:

data-diff db1 table1 db2 table2    # hashdiff (just as before)
data-diff db1 table1 table2        # joindiff
data-diff db1 table1 db2 table2  -a joindiff  # also joindiff, throws error if db1 != db2

TODO:

  • More tests for the joindiff algorithm
  • Update readme docs

@erezsh erezsh force-pushed the joindiff branch 3 times, most recently from d1b9985 to e998d63 Compare October 5, 2022 16:04
Now works for : postgresql, mysql, bigquery, presto, trino, snowflake, oracle, redshift
@erezsh erezsh merged commit fb1c323 into master Oct 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant