Skip to content

Commit 5958013

Browse files
committed
Merge pull request #4498 from Komnomnomnom/ujson-datetime
JSON native support for datetime encoding
2 parents 359017f + dbd724c commit 5958013

File tree

11 files changed

+602
-349
lines changed

11 files changed

+602
-349
lines changed

doc/source/io.rst

Lines changed: 50 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1034,11 +1034,12 @@ with optional parameters:
10341034
``columns``; dict like {column -> {index -> value}}
10351035
``values``; just the values array
10361036

1037-
- ``date_format`` : type of date conversion (epoch = epoch milliseconds, iso = ISO8601), default is epoch
1037+
- ``date_format`` : string, type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601.
10381038
- ``double_precision`` : The number of decimal places to use when encoding floating point values, default 10.
10391039
- ``force_ascii`` : force encoded string to be ASCII, default True.
1040+
- ``date_unit`` : The time unit to encode to, governs timestamp and ISO8601 precision. One of 's', 'ms', 'us' or 'ns' for seconds, milliseconds, microseconds and nanoseconds respectively. Default 'ms'.
10401041

1041-
Note NaN's and None will be converted to null and datetime objects will be converted based on the date_format parameter
1042+
Note NaN's, NaT's and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters.
10421043

10431044
.. ipython:: python
10441045
@@ -1055,6 +1056,20 @@ Writing in iso date format
10551056
json = dfd.to_json(date_format='iso')
10561057
json
10571058
1059+
Writing in iso date format, with microseconds
1060+
1061+
.. ipython:: python
1062+
1063+
json = dfd.to_json(date_format='iso', date_unit='us')
1064+
json
1065+
1066+
Actually I prefer epoch timestamps, in seconds
1067+
1068+
.. ipython:: python
1069+
1070+
json = dfd.to_json(date_format='epoch', date_unit='s')
1071+
json
1072+
10581073
Writing to a file, with a date index and a date column
10591074

10601075
.. ipython:: python
@@ -1063,7 +1078,7 @@ Writing to a file, with a date index and a date column
10631078
dfj2['date'] = Timestamp('20130101')
10641079
dfj2['ints'] = list(range(5))
10651080
dfj2['bools'] = True
1066-
dfj2.index = date_range('20130101',periods=5)
1081+
dfj2.index = date_range('20130101', periods=5)
10671082
dfj2.to_json('test.json')
10681083
open('test.json').read()
10691084
@@ -1107,16 +1122,22 @@ is ``None``. To explicity force ``Series`` parsing, pass ``typ=series``
11071122
- ``keep_default_dates`` : boolean, default True. If parsing dates, then parse the default datelike columns
11081123
- ``numpy`` : direct decoding to numpy arrays. default is False;
11091124
Note that the JSON ordering **MUST** be the same for each term if ``numpy=True``
1110-
- ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function
1111-
when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality
1125+
- ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality
1126+
- ``date_unit`` : string, the timestamp unit to detect if converting dates. Default
1127+
None. By default the timestamp precision will be detected, if this is not desired
1128+
then pass one of 's', 'ms', 'us' or 'ns' to force timestamp precision to
1129+
seconds, milliseconds, microseconds or nanoseconds respectively.
11121130

1113-
The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is
1114-
not parsable.
1131+
The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is not parsable.
11151132

11161133
The default of ``convert_axes=True``, ``dtype=True``, and ``convert_dates=True`` will try to parse the axes, and all of the data
11171134
into appropriate types, including dates. If you need to override specific dtypes, pass a dict to ``dtype``. ``convert_axes`` should only
11181135
be set to ``False`` if you need to preserve string-like numbers (e.g. '1', '2') in an axes.
11191136

1137+
.. note::
1138+
1139+
Large integer values may be converted to dates if ``convert_dates=True`` and the data and / or column labels appear 'date-like'. The exact threshold depends on the ``date_unit`` specified.
1140+
11201141
.. warning::
11211142

11221143
When reading JSON data, automatic coercing into dtypes has some quirks:
@@ -1143,13 +1164,13 @@ Don't convert any data (but still convert axes and dates)
11431164

11441165
.. ipython:: python
11451166
1146-
pd.read_json('test.json',dtype=object).dtypes
1167+
pd.read_json('test.json', dtype=object).dtypes
11471168
11481169
Specify how I want to convert data
11491170

11501171
.. ipython:: python
11511172
1152-
pd.read_json('test.json',dtype={'A' : 'float32', 'bools' : 'int8'}).dtypes
1173+
pd.read_json('test.json', dtype={'A' : 'float32', 'bools' : 'int8'}).dtypes
11531174
11541175
I like my string indicies
11551176

@@ -1163,11 +1184,30 @@ I like my string indicies
11631184
si.columns
11641185
json = si.to_json()
11651186
1166-
sij = pd.read_json(json,convert_axes=False)
1187+
sij = pd.read_json(json, convert_axes=False)
11671188
sij
11681189
sij.index
11691190
sij.columns
11701191
1192+
My dates have been written in nanoseconds, so they need to be read back in
1193+
nanoseconds
1194+
1195+
.. ipython:: python
1196+
1197+
json = dfj2.to_json(date_unit='ns')
1198+
1199+
# Try to parse timestamps as millseconds -> Won't Work
1200+
dfju = pd.read_json(json, date_unit='ms')
1201+
dfju
1202+
1203+
# Let Pandas detect the correct precision
1204+
dfju = pd.read_json(json)
1205+
dfju
1206+
1207+
# Or specify that all timestamps are in nanoseconds
1208+
dfju = pd.read_json(json, date_unit='ns')
1209+
dfju
1210+
11711211
.. ipython:: python
11721212
:suppress:
11731213

doc/source/release.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ pandas 0.13
4747
- Added a more informative error message when plot arguments contain
4848
overlapping color and style arguments (:issue:`4402`)
4949
- Significant table writing performance improvements in ``HDFStore``
50+
- JSON date serialisation now performed in low-level C code.
5051
- ``Index.copy()`` and ``MultiIndex.copy()`` now accept keyword arguments to
5152
change attributes (i.e., ``names``, ``levels``, ``labels``)
5253
(:issue:`4039`)
@@ -94,6 +95,10 @@ pandas 0.13
9495
- removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
9596
be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
9697
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`)
98+
- ``JSON``
99+
100+
- added ``date_unit`` parameter to specify resolution of timestamps. Options
101+
are seconds, milliseconds, microseconds and nanoseconds. (:issue:`4362`, :issue:`4498`).
97102

98103
- ``Index`` and ``MultiIndex`` changes (:issue:`4039`):
99104

@@ -134,6 +139,9 @@ pandas 0.13
134139
local variable was undefined (:issue:`4381`)
135140
- In ``to_json``, raise if a passed ``orient`` would cause loss of data because
136141
of a duplicate index (:issue:`4359`)
142+
- In ``to_json``, fix date handling so milliseconds are the default timestamp
143+
as the docstring says (:issue:`4362`).
144+
- JSON NaT handling fixed, NaTs are now serialised to `null` (:issue:`4498`)
137145
- Fixed passing ``keep_default_na=False`` when ``na_values=None`` (:issue:`4318`)
138146
- Fixed bug with ``values`` raising an error on a DataFrame with duplicate columns and mixed
139147
dtypes, surfaced in (:issue:`4377`)

pandas/core/generic.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -535,7 +535,7 @@ def to_clipboard(self):
535535
clipboard.to_clipboard(self)
536536

537537
def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
538-
double_precision=10, force_ascii=True):
538+
double_precision=10, force_ascii=True, date_unit='ms'):
539539
"""
540540
Convert the object to a JSON string.
541541
@@ -566,11 +566,15 @@ def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
566566
- columns : dict like {column -> {index -> value}}
567567
- values : just the values array
568568
569-
date_format : type of date conversion (epoch = epoch milliseconds, iso = ISO8601)
570-
default is epoch
569+
date_format : string, default 'epoch'
570+
type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601
571571
double_precision : The number of decimal places to use when encoding
572572
floating point values, default 10.
573573
force_ascii : force encoded string to be ASCII, default True.
574+
date_unit : string, default 'ms' (milliseconds)
575+
The time unit to encode to, governs timestamp and ISO8601
576+
precision. One of 's', 'ms', 'us', 'ns' for second, millisecond,
577+
microsecond, and nanosecond respectively.
574578
575579
Returns
576580
-------
@@ -580,8 +584,13 @@ def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
580584
"""
581585

582586
from pandas.io import json
583-
return json.to_json(path_or_buf=path_or_buf, obj=self, orient=orient, date_format=date_format,
584-
double_precision=double_precision, force_ascii=force_ascii)
587+
return json.to_json(
588+
path_or_buf=path_or_buf,
589+
obj=self, orient=orient,
590+
date_format=date_format,
591+
double_precision=double_precision,
592+
force_ascii=force_ascii,
593+
date_unit=date_unit)
585594

586595
# install the indexerse
587596
for _name, _indexer in indexing.get_indexers_list():

0 commit comments

Comments
 (0)