Description
It seems https://github.com/kpdecker/jsdiff/blob/3b654c2ed7d5262ed9946de841ad8dae990286c7/src/diff/base.js does not implement "4b. A Linear Space Refinement" mentioned in Myers' paper. So the memory usage is O(D^2). From the code it looks like O(len(bestPath) * len(components in bestPath))
might be that complexity.
This makes it practically too slow to complete diff calculation if there are many changes. For example, comparing the prettified Hearthstone cards.json
in different languages wouldn't complete in 20 minutes using jsdiff
:
Reproduce:
# Prepare test data
wget https://api.hearthstonejson.com/v1/168129/enUS/cards.json -O 1
wget https://api.hearthstonejson.com/v1/168129/zhCN/cards.json -O 2
python3 -m json.tool < 1 > a
python3 -m json.tool < 2 > b
// a.js - Calculate diff
const fs = require('fs')
const diff = require('diff');
const a = fs.readFileSync('a', {encoding: 'utf-8'});
const b = fs.readFileSync('b', {encoding: 'utf-8'});
console.log(`pid: ${process.pid}`);
console.log(diff.diffLines(a, b).length);
Profiler shows that nodejs spent 70% time on GC:
As a comparison, it seems diff-match-patch
implements the linear space refinement and can completes the above test case in ~24s. (For reference, git diff --stat --minimal
completes in ~0.3s).