Skip to content

Commit 04e0d80

Browse files
authored
Optimize levenshtein a bit for memory usage (#13830)
When all costs are equal, levenshtein fulfills the requirements of being a metric. A metric is symmetric, so we can swap the strings in that case. Since we use rows of a partial matrix of length |string2| we can make the choice of using string1 instead if |string1| < |string2|, which will optimize memory usage and CPU time.
1 parent 33a523f commit 04e0d80

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

ext/standard/levenshtein.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,15 @@ static zend_long reference_levdist(const zend_string *string1, const zend_string
3232
return ZSTR_LEN(string1) * cost_del;
3333
}
3434

35+
/* When all costs are equal, levenshtein fulfills the requirements of a metric, which means
36+
* that the distance is symmetric. If string1 is shorter than string 2 we can save memory (and CPU time)
37+
* by having shorter rows (p1 & p2). */
38+
if (ZSTR_LEN(string1) < ZSTR_LEN(string2) && cost_ins == cost_rep && cost_rep == cost_del) {
39+
const zend_string *tmp = string1;
40+
string1 = string2;
41+
string2 = tmp;
42+
}
43+
3544
p1 = safe_emalloc((ZSTR_LEN(string2) + 1), sizeof(zend_long), 0);
3645
p2 = safe_emalloc((ZSTR_LEN(string2) + 1), sizeof(zend_long), 0);
3746

0 commit comments

Comments
 (0)