Skip to content

Suboptimal (O(D^2)) memory usage #396

Open
@quark-zju

Description

@quark-zju

It seems https://github.com/kpdecker/jsdiff/blob/3b654c2ed7d5262ed9946de841ad8dae990286c7/src/diff/base.js does not implement "4b. A Linear Space Refinement" mentioned in Myers' paper. So the memory usage is O(D^2). From the code it looks like O(len(bestPath) * len(components in bestPath)) might be that complexity.

This makes it practically too slow to complete diff calculation if there are many changes. For example, comparing the prettified Hearthstone cards.json in different languages wouldn't complete in 20 minutes using jsdiff:

Reproduce:

# Prepare test data
wget https://api.hearthstonejson.com/v1/168129/enUS/cards.json -O 1
wget https://api.hearthstonejson.com/v1/168129/zhCN/cards.json -O 2
python3 -m json.tool < 1 > a
python3 -m json.tool < 2 > b
// a.js - Calculate diff
const fs = require('fs')
const diff = require('diff');

const a = fs.readFileSync('a', {encoding: 'utf-8'});
const b = fs.readFileSync('b', {encoding: 'utf-8'});

console.log(`pid: ${process.pid}`);
console.log(diff.diffLines(a, b).length);

Profiler shows that nodejs spent 70% time on GC:

image

As a comparison, it seems diff-match-patch implements the linear space refinement and can completes the above test case in ~24s. (For reference, git diff --stat --minimal completes in ~0.3s).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions