-
-
Notifications
You must be signed in to change notification settings - Fork 354
Add Sink that implements git's diffing improvement heuristics #2011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks so much for contributing, this will be such a great addition to
As this is plumbing, and as As a matter of fact, it would be very nice if the algorithm could be used and I propose adding it as an option (probably even the default) to Meanwhile I wonder if this PR should be ready for review, or if it should rather be a draft as you plan to make some more changes? |
I think these three should be OK.
For this one I am not sure how to proceed. I found contradicting information online regarding the licensing. The original code was written in C. The Rust code in this PR was inspired by it but I used a different way of doing the optimizations. The weight factors and the algorithm (I don't mean code here but the theory behind the algorithm) are 100% from the xdiff library used by git but the rest is my own work. How does LGPL-2.1 apply here 🤔 ? Maybe you know more than me on this topic :D. |
Great, then I will take a look!
I certainly don't, but I wouldn't fret it. My intuition here is that the work itself is copyrighted, i.e. the source code as is, but not underlying knowledge. Besides, this wouldn't be the first reimplementation of semantics of the |
d0596a7
to
08681f3
Compare
I think it may be preferable for the added 2c31d30 commit message to be conventional with |
That is true! I missed that opportunity but fixed up the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, this is a great start!
I have reworked it a little to be more flexible, and added unified diffs to the test to see the sliding in action.
Something I'd love to have is actual baseline tests, i.e. tests taken from the diff-slider-tools, that are then run through Git which produces the expectation for the algorithm itself.
Also I have a feeling that the direct slice indexing can panic when the conditions are right, and such a baseline could possibly help triggering such failures. Maybe one could also use a fuzzer to generate a wide variety of inputs to help battle-test this.
I hope you find such prospects intriguing, too :).
Thanks for the improvements!
Definitely, will try to propose some tests when I have time. |
This PR was initially targeted at
image-diff
but the maintainer seems to not have enough time at the moment so I moved it here.What does this PR do?
This PR introduces a new
Sink
that can be used to optimize diffs created withimara-diff
. I tried to implement the same heuristics thatgit
uses. The code is also structured very similar.While trying to add it to
gix-diff
, I was unsure how it should be included. Is it just a normalSink
that can be used withimara_diff::diff
or do we force users to use it by not re-exporting theimara_diff::diff
method but creating a customdiff
method in thegix-diff
crate? What do the maintainers think?Things that still need to be done