Skip to content

Commit 3b0f051

Browse files
committed
Allow empty $escape to eschew escaping CSV
Albeit CSV is still a widespread data exchange format, it has never been officially standardized. There exists, however, the “informational” RFC 4180[1] which has no notion of escape characters, but rather defines `escaped` as strings enclosed in double-quotes where contained double-quotes have to be doubled. While this concept is supported by PHP's implementation (`$enclosure`), the `$escape` sometimes interferes, so that `fgetcsv()` is unable to correctly parse externally generated CSV, and `fputcsv()` is sometimes generating non-compliant CSV. Since PHP's `$escape` concept is availble for many years, we cannot drop it for BC reasons (even though many consider it as bug). Instead we allow to pass an empty string as `$escape` parameter to the respective functions, which results in ignoring/omitting any escaping, and as such is more inline with RFC 4180. It is noteworthy that this is almost no userland BC break, since formerly most functions did not accept an empty string, and failed in this case. The only exception was `str_getcsv()` which did accept an empty string, and used a backslash as escape character then (which appears to be unintended behavior, anyway). The changed functions are `fputcsv()`, `fgetcsv()` and `str_getcsv()`, and also the `::setCsvControl()`, `::getCsvControl()`, `::fputcsv()`, and `::fgetcsv()` methods of `SplFileObject`. The implementation also changes the type of the escape parameter of the PHP_APIs `php_fgetcsv()` and `php_fputcsv()` from `char` to `int`, where `PHP_CSV_NO_ESCAPE` means to ignore/omit escaping. The parameter accepts the same values as `isalpha()` and friends, i.e. “the value of which shall be representable as an `unsigned char` or shall equal the value of the macro `EOF`. If the argument has any other value, the behavior is undefined.” This is a subtle BC break, since the character `chr(128)` has the value `-1` if `char` is signed, and so likely would be confused with `EOF` when converted to `int`. We consider this BC break to be acceptable, since it's rather unlikely that anybody uses `chr(128)` as escape character, and it easily can be fixed by casting all `escape` arguments to `unsigned char`. This patch implements the feature requests 38301[2] and 51496[3]. [1] <https://tools.ietf.org/html/rfc4180> [2] <https://bugs.php.net/bug.php?id=38301> [3] <https://bugs.php.net/bug.php?id=51496>
1 parent d206630 commit 3b0f051

16 files changed

+244
-47
lines changed

NEWS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ PHP NEWS
5757
- Standard:
5858
. Fixed bug #74764 (Bindto IPv6 works with file_get_contents but fails with
5959
stream_socket_client). (Ville Hukkamäki)
60+
. Implemented FR #38301 (field enclosure behavior in fputcsv). (cmb)
61+
. Implemented FR #51496 (fgetcsv should take empty string as an escape). (cmb)
6062

6163
- Reflection:
6264
. Fixed bug #76737 (Unserialized reflection objects are broken, they

UPGRADING

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,18 @@ PHP 7.4 UPGRADE NOTES
8989
5. Changed Functions
9090
========================================
9191

92+
- SPL:
93+
. SplFileObject::fputcsv(), ::fgetcsv() and ::setCsvControl() now accept an
94+
emptry string as $escape argument, which disables the propriertary PHP
95+
escaping mechanism. SplFileObject::getCsvControl() now may also return an
96+
empty string for the third array element, accordingly.
97+
98+
- Standard:
99+
. fputcsv() and fgetcsv() now accept an emptry string as $escape argument,
100+
which disables the propriertary PHP escaping mechanism. The behavior of
101+
str_getcsv() has been adjusted accordingly (formerly, an empty string was
102+
identical to using the default).
103+
92104
========================================
93105
6. New Functions
94106
========================================

UPGRADING.INTERNALS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ PHP 7.4 INTERNALS UPGRADE NOTES
99
f. get_properties_for() handler / Z_OBJDEBUG_P
1010
g. Required object handlers
1111
h. Immutable classes and op_arrays
12+
i. php_fgetcsv() and php_fputcsv()
1213

1314
2. Build system changes
1415
a. Abstract
@@ -136,6 +137,11 @@ PHP 7.4 INTERNALS UPGRADE NOTES
136137
zend_get_op_array_extension_handle() during MINIT and access its value
137138
using ZEND_OP_ARRAY_EXTENSION(op_array, handle).
138139

140+
i. The type of the escape parameter of php_fgetcsv() and php_fputcsv() has
141+
been changed from char to int. This allows to pass the new constant macro
142+
PHP_CSV_NO_ESCAPE to this parameter, to disable PHP's proprietary escape
143+
mechanism.
144+
139145
========================
140146
2. Build system changes
141147
========================

ext/spl/spl_directory.c

Lines changed: 41 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -323,7 +323,7 @@ static int spl_filesystem_file_open(spl_filesystem_object *intern, int use_inclu
323323

324324
intern->u.file.delimiter = ',';
325325
intern->u.file.enclosure = '"';
326-
intern->u.file.escape = '\\';
326+
intern->u.file.escape = (unsigned char) '\\';
327327

328328
intern->u.file.func_getCurr = zend_hash_str_find_ptr(&intern->std.ce->function_table, "getcurrentline", sizeof("getcurrentline") - 1);
329329

@@ -2109,7 +2109,7 @@ static int spl_filesystem_file_call(spl_filesystem_object *intern, zend_function
21092109
spl_filesystem_file_call(intern, func_ptr, pass_num_args, return_value, arg2); \
21102110
} /* }}} */
21112111

2112-
static int spl_filesystem_file_read_csv(spl_filesystem_object *intern, char delimiter, char enclosure, char escape, zval *return_value) /* {{{ */
2112+
static int spl_filesystem_file_read_csv(spl_filesystem_object *intern, char delimiter, char enclosure, int escape, zval *return_value) /* {{{ */
21132113
{
21142114
int ret = SUCCESS;
21152115
zval *value;
@@ -2562,7 +2562,8 @@ SPL_METHOD(SplFileObject, func_name) \
25622562
SPL_METHOD(SplFileObject, fgetcsv)
25632563
{
25642564
spl_filesystem_object *intern = Z_SPLFILESYSTEM_P(ZEND_THIS);
2565-
char delimiter = intern->u.file.delimiter, enclosure = intern->u.file.enclosure, escape = intern->u.file.escape;
2565+
char delimiter = intern->u.file.delimiter, enclosure = intern->u.file.enclosure;
2566+
int escape = intern->u.file.escape;
25662567
char *delim = NULL, *enclo = NULL, *esc = NULL;
25672568
size_t d_len = 0, e_len = 0, esc_len = 0;
25682569

@@ -2576,11 +2577,15 @@ SPL_METHOD(SplFileObject, fgetcsv)
25762577
switch(ZEND_NUM_ARGS())
25772578
{
25782579
case 3:
2579-
if (esc_len != 1) {
2580-
php_error_docref(NULL, E_WARNING, "escape must be a character");
2580+
if (esc_len > 1) {
2581+
php_error_docref(NULL, E_WARNING, "escape must be empty or a single character");
25812582
RETURN_FALSE;
25822583
}
2583-
escape = esc[0];
2584+
if (esc_len == 0) {
2585+
escape = PHP_CSV_NO_ESCAPE;
2586+
} else {
2587+
escape = (unsigned char) esc[0];
2588+
}
25842589
/* no break */
25852590
case 2:
25862591
if (e_len != 1) {
@@ -2609,7 +2614,8 @@ SPL_METHOD(SplFileObject, fgetcsv)
26092614
SPL_METHOD(SplFileObject, fputcsv)
26102615
{
26112616
spl_filesystem_object *intern = Z_SPLFILESYSTEM_P(ZEND_THIS);
2612-
char delimiter = intern->u.file.delimiter, enclosure = intern->u.file.enclosure, escape = intern->u.file.escape;
2617+
char delimiter = intern->u.file.delimiter, enclosure = intern->u.file.enclosure;
2618+
int escape = intern->u.file.escape;
26132619
char *delim = NULL, *enclo = NULL, *esc = NULL;
26142620
size_t d_len = 0, e_len = 0, esc_len = 0;
26152621
zend_long ret;
@@ -2619,11 +2625,17 @@ SPL_METHOD(SplFileObject, fputcsv)
26192625
switch(ZEND_NUM_ARGS())
26202626
{
26212627
case 4:
2622-
if (esc_len != 1) {
2623-
php_error_docref(NULL, E_WARNING, "escape must be a character");
2624-
RETURN_FALSE;
2628+
switch (esc_len) {
2629+
case 0:
2630+
escape = PHP_CSV_NO_ESCAPE;
2631+
break;
2632+
case 1:
2633+
escape = (unsigned char) esc[0];
2634+
break;
2635+
default:
2636+
php_error_docref(NULL, E_WARNING, "escape must be empty or a single character");
2637+
RETURN_FALSE;
26252638
}
2626-
escape = esc[0];
26272639
/* no break */
26282640
case 3:
26292641
if (e_len != 1) {
@@ -2654,19 +2666,26 @@ SPL_METHOD(SplFileObject, fputcsv)
26542666
SPL_METHOD(SplFileObject, setCsvControl)
26552667
{
26562668
spl_filesystem_object *intern = Z_SPLFILESYSTEM_P(ZEND_THIS);
2657-
char delimiter = ',', enclosure = '"', escape='\\';
2669+
char delimiter = ',', enclosure = '"';
2670+
int escape = (unsigned char) '\\';
26582671
char *delim = NULL, *enclo = NULL, *esc = NULL;
26592672
size_t d_len = 0, e_len = 0, esc_len = 0;
26602673

26612674
if (zend_parse_parameters(ZEND_NUM_ARGS(), "|sss", &delim, &d_len, &enclo, &e_len, &esc, &esc_len) == SUCCESS) {
26622675
switch(ZEND_NUM_ARGS())
26632676
{
26642677
case 3:
2665-
if (esc_len != 1) {
2666-
php_error_docref(NULL, E_WARNING, "escape must be a character");
2667-
RETURN_FALSE;
2678+
switch (esc_len) {
2679+
case 0:
2680+
escape = PHP_CSV_NO_ESCAPE;
2681+
break;
2682+
case 1:
2683+
escape = (unsigned char) esc[0];
2684+
break;
2685+
default:
2686+
php_error_docref(NULL, E_WARNING, "escape must be empty or a single character");
2687+
RETURN_FALSE;
26682688
}
2669-
escape = esc[0];
26702689
/* no break */
26712690
case 2:
26722691
if (e_len != 1) {
@@ -2705,8 +2724,12 @@ SPL_METHOD(SplFileObject, getCsvControl)
27052724
delimiter[1] = '\0';
27062725
enclosure[0] = intern->u.file.enclosure;
27072726
enclosure[1] = '\0';
2708-
escape[0] = intern->u.file.escape;
2709-
escape[1] = '\0';
2727+
if (intern->u.file.escape == PHP_CSV_NO_ESCAPE) {
2728+
escape[0] = '\0';
2729+
} else {
2730+
escape[0] = (unsigned char) intern->u.file.escape;
2731+
escape[1] = '\0';
2732+
}
27102733

27112734
add_next_index_string(return_value, delimiter);
27122735
add_next_index_string(return_value, enclosure);

ext/spl/spl_directory.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ struct _spl_filesystem_object {
9696
zend_function *func_getCurr;
9797
char delimiter;
9898
char enclosure;
99-
char escape;
99+
int escape;
100100
} file;
101101
} u;
102102
zend_object std;
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
--TEST--
2+
SplFileObject::fgetcsv() with empty $escape
3+
--FILE--
4+
<?php
5+
$contents = <<<EOS
6+
"cell1","cell2\\","cell3","cell4"
7+
"\\\\\\line1
8+
line2\\\\\\"
9+
EOS;
10+
$file = new SplTempFileObject;
11+
$file->fwrite($contents);
12+
$file->rewind();
13+
while (($data = $file->fgetcsv(',', '"', ''))) {
14+
print_r($data);
15+
}
16+
?>
17+
===DONE===
18+
--EXPECT--
19+
Array
20+
(
21+
[0] => cell1
22+
[1] => cell2\
23+
[2] => cell3
24+
[3] => cell4
25+
)
26+
Array
27+
(
28+
[0] => \\\line1
29+
line2\\\
30+
)
31+
===DONE===

ext/spl/tests/SplFileObject_fgetcsv_escape_error.phpt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,5 +14,5 @@ var_dump($fo->fgetcsv(',', '"', 'invalid'));
1414
unlink('SplFileObject__fgetcsv8.csv');
1515
?>
1616
--EXPECTF--
17-
Warning: SplFileObject::fgetcsv(): escape must be a character in %s on line %d
17+
Warning: SplFileObject::fgetcsv(): escape must be empty or a single character in %s on line %d
1818
bool(false)
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
--TEST--
2+
SplFileObject::fputcsv() with empty $escape
3+
--FILE--
4+
<?php
5+
$data = array(
6+
['\\'],
7+
['\\"']
8+
);
9+
$file = new SplTempFileObject;
10+
foreach ($data as $record) {
11+
$file->fputcsv($record, ',', '"', '');
12+
}
13+
$file->rewind();
14+
foreach ($file as $line) {
15+
echo $line;
16+
}
17+
?>
18+
===DONE===
19+
--EXPECT--
20+
\
21+
"\"""
22+
===DONE===

ext/spl/tests/SplFileObject_setCsvControl_error003.phpt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,4 @@ $s->setCsvControl('|', '\'', 'three');
2222
unlink('csv_control_data.csv');
2323
?>
2424
--EXPECTF--
25-
Warning: SplFileObject::setCsvControl(): escape must be a character in %s on line %d
25+
Warning: SplFileObject::setCsvControl(): escape must be empty or a single character in %s on line %d
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
--TEST--
2+
SplFileObject::setCsvControl() and ::getCsvControl() with empty $escape
3+
--FILE--
4+
<?php
5+
$file = new SplTempFileObject;
6+
$file->setCsvControl(',', '"', '');
7+
var_dump($file->getCsvControl());
8+
?>
9+
===DONE===
10+
--EXPECT--
11+
array(3) {
12+
[0]=>
13+
string(1) ","
14+
[1]=>
15+
string(1) """
16+
[2]=>
17+
string(0) ""
18+
}
19+
===DONE===

ext/standard/file.c

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1857,9 +1857,9 @@ static const char *php_fgetcsv_lookup_trailing_spaces(const char *ptr, size_t le
18571857
Format line as CSV and write to file pointer */
18581858
PHP_FUNCTION(fputcsv)
18591859
{
1860-
char delimiter = ','; /* allow this to be set as parameter */
1861-
char enclosure = '"'; /* allow this to be set as parameter */
1862-
char escape_char = '\\'; /* allow this to be set as parameter */
1860+
char delimiter = ','; /* allow this to be set as parameter */
1861+
char enclosure = '"'; /* allow this to be set as parameter */
1862+
int escape_char = (unsigned char) '\\'; /* allow this to be set as parameter */
18631863
php_stream *stream;
18641864
zval *fp = NULL, *fields = NULL;
18651865
size_t ret;
@@ -1900,14 +1900,15 @@ PHP_FUNCTION(fputcsv)
19001900
}
19011901

19021902
if (escape_str != NULL) {
1903+
if (escape_str_len > 1) {
1904+
php_error_docref(NULL, E_NOTICE, "escape must be empty or a single character");
1905+
}
19031906
if (escape_str_len < 1) {
1904-
php_error_docref(NULL, E_WARNING, "escape must be a character");
1905-
RETURN_FALSE;
1906-
} else if (escape_str_len > 1) {
1907-
php_error_docref(NULL, E_NOTICE, "escape must be a single character");
1907+
escape_char = PHP_CSV_NO_ESCAPE;
1908+
} else {
1909+
/* use first character from string */
1910+
escape_char = (unsigned char) *escape_str;
19081911
}
1909-
/* use first character from string */
1910-
escape_char = *escape_str;
19111912
}
19121913

19131914
PHP_STREAM_TO_ZVAL(stream, fp);
@@ -1917,14 +1918,15 @@ PHP_FUNCTION(fputcsv)
19171918
}
19181919
/* }}} */
19191920

1920-
/* {{{ PHPAPI size_t php_fputcsv(php_stream *stream, zval *fields, char delimiter, char enclosure, char escape_char) */
1921-
PHPAPI size_t php_fputcsv(php_stream *stream, zval *fields, char delimiter, char enclosure, char escape_char)
1921+
/* {{{ PHPAPI size_t php_fputcsv(php_stream *stream, zval *fields, char delimiter, char enclosure, int escape_char) */
1922+
PHPAPI size_t php_fputcsv(php_stream *stream, zval *fields, char delimiter, char enclosure, int escape_char)
19221923
{
19231924
int count, i = 0;
19241925
size_t ret;
19251926
zval *field_tmp;
19261927
smart_str csvline = {0};
19271928

1929+
ZEND_ASSERT((escape_char >= 0 && escape_char <= UCHAR_MAX) || escape_char == PHP_CSV_NO_ESCAPE);
19281930
count = zend_hash_num_elements(Z_ARRVAL_P(fields));
19291931
ZEND_HASH_FOREACH_VAL(Z_ARRVAL_P(fields), field_tmp) {
19301932
zend_string *tmp_field_str;
@@ -1933,7 +1935,7 @@ PHPAPI size_t php_fputcsv(php_stream *stream, zval *fields, char delimiter, char
19331935
/* enclose a field that contains a delimiter, an enclosure character, or a newline */
19341936
if (FPUTCSV_FLD_CHK(delimiter) ||
19351937
FPUTCSV_FLD_CHK(enclosure) ||
1936-
FPUTCSV_FLD_CHK(escape_char) ||
1938+
(escape_char != PHP_CSV_NO_ESCAPE && FPUTCSV_FLD_CHK(escape_char)) ||
19371939
FPUTCSV_FLD_CHK('\n') ||
19381940
FPUTCSV_FLD_CHK('\r') ||
19391941
FPUTCSV_FLD_CHK('\t') ||
@@ -1945,7 +1947,7 @@ PHPAPI size_t php_fputcsv(php_stream *stream, zval *fields, char delimiter, char
19451947

19461948
smart_str_appendc(&csvline, enclosure);
19471949
while (ch < end) {
1948-
if (*ch == escape_char) {
1950+
if (escape_char != PHP_CSV_NO_ESCAPE && *ch == escape_char) {
19491951
escaped = 1;
19501952
} else if (!escaped && *ch == enclosure) {
19511953
smart_str_appendc(&csvline, enclosure);
@@ -1983,7 +1985,7 @@ PHP_FUNCTION(fgetcsv)
19831985
{
19841986
char delimiter = ','; /* allow this to be set as parameter */
19851987
char enclosure = '"'; /* allow this to be set as parameter */
1986-
char escape = '\\';
1988+
int escape = (unsigned char) '\\';
19871989

19881990
/* first section exactly as php_fgetss */
19891991

@@ -2036,14 +2038,15 @@ PHP_FUNCTION(fgetcsv)
20362038
}
20372039

20382040
if (escape_str != NULL) {
2039-
if (escape_str_len < 1) {
2040-
php_error_docref(NULL, E_WARNING, "escape must be character");
2041-
RETURN_FALSE;
2042-
} else if (escape_str_len > 1) {
2043-
php_error_docref(NULL, E_NOTICE, "escape must be a single character");
2041+
if (escape_str_len > 1) {
2042+
php_error_docref(NULL, E_NOTICE, "escape must be empty or a single character");
20442043
}
20452044

2046-
escape = escape_str[0];
2045+
if (escape_str_len < 1) {
2046+
escape = PHP_CSV_NO_ESCAPE;
2047+
} else {
2048+
escape = (unsigned char) escape_str[0];
2049+
}
20472050
}
20482051

20492052
if (len_zv != NULL && Z_TYPE_P(len_zv) != IS_NULL) {
@@ -2077,13 +2080,15 @@ PHP_FUNCTION(fgetcsv)
20772080
}
20782081
/* }}} */
20792082

2080-
PHPAPI void php_fgetcsv(php_stream *stream, char delimiter, char enclosure, char escape_char, size_t buf_len, char *buf, zval *return_value) /* {{{ */
2083+
PHPAPI void php_fgetcsv(php_stream *stream, char delimiter, char enclosure, int escape_char, size_t buf_len, char *buf, zval *return_value) /* {{{ */
20812084
{
20822085
char *temp, *tptr, *bptr, *line_end, *limit;
20832086
size_t temp_len, line_end_len;
20842087
int inc_len;
20852088
zend_bool first_field = 1;
20862089

2090+
ZEND_ASSERT((escape_char >= 0 && escape_char <= UCHAR_MAX) || escape_char == PHP_CSV_NO_ESCAPE);
2091+
20872092
/* initialize internal state */
20882093
php_mb_reset();
20892094

@@ -2227,7 +2232,7 @@ PHPAPI void php_fgetcsv(php_stream *stream, char delimiter, char enclosure, char
22272232
default:
22282233
if (*bptr == enclosure) {
22292234
state = 2;
2230-
} else if (*bptr == escape_char) {
2235+
} else if (escape_char != PHP_CSV_NO_ESCAPE && *bptr == escape_char) {
22312236
state = 1;
22322237
}
22332238
bptr++;

0 commit comments

Comments
 (0)