Skip to content

JSON native support for datetime encoding #4498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 15, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 50 additions & 10 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1034,11 +1034,12 @@ with optional parameters:
``columns``; dict like {column -> {index -> value}}
``values``; just the values array

- ``date_format`` : type of date conversion (epoch = epoch milliseconds, iso = ISO8601), default is epoch
- ``date_format`` : string, type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601.
- ``double_precision`` : The number of decimal places to use when encoding floating point values, default 10.
- ``force_ascii`` : force encoded string to be ASCII, default True.
- ``date_unit`` : The time unit to encode to, governs timestamp and ISO8601 precision. One of 's', 'ms', 'us' or 'ns' for seconds, milliseconds, microseconds and nanoseconds respectively. Default 'ms'.

Note NaN's and None will be converted to null and datetime objects will be converted based on the date_format parameter
Note NaN's, NaT's and None will be converted to null and datetime objects will be converted based on the date_format and date_unit parameters.

.. ipython:: python

Expand All @@ -1055,6 +1056,20 @@ Writing in iso date format
json = dfd.to_json(date_format='iso')
json

Writing in iso date format, with microseconds

.. ipython:: python

json = dfd.to_json(date_format='iso', date_unit='us')
json

Actually I prefer epoch timestamps, in seconds

.. ipython:: python

json = dfd.to_json(date_format='epoch', date_unit='s')
json

Writing to a file, with a date index and a date column

.. ipython:: python
Expand All @@ -1063,7 +1078,7 @@ Writing to a file, with a date index and a date column
dfj2['date'] = Timestamp('20130101')
dfj2['ints'] = list(range(5))
dfj2['bools'] = True
dfj2.index = date_range('20130101',periods=5)
dfj2.index = date_range('20130101', periods=5)
dfj2.to_json('test.json')
open('test.json').read()

Expand Down Expand Up @@ -1107,16 +1122,22 @@ is ``None``. To explicity force ``Series`` parsing, pass ``typ=series``
- ``keep_default_dates`` : boolean, default True. If parsing dates, then parse the default datelike columns
- ``numpy`` : direct decoding to numpy arrays. default is False;
Note that the JSON ordering **MUST** be the same for each term if ``numpy=True``
- ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function
when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality
- ``precise_float`` : boolean, default ``False``. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (``False``) is to use fast but less precise builtin functionality
- ``date_unit`` : string, the timestamp unit to detect if converting dates. Default
None. By default the timestamp precision will be detected, if this is not desired
then pass one of 's', 'ms', 'us' or 'ns' to force timestamp precision to
seconds, milliseconds, microseconds or nanoseconds respectively.

The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is
not parsable.
The parser will raise one of ``ValueError/TypeError/AssertionError`` if the JSON is not parsable.

The default of ``convert_axes=True``, ``dtype=True``, and ``convert_dates=True`` will try to parse the axes, and all of the data
into appropriate types, including dates. If you need to override specific dtypes, pass a dict to ``dtype``. ``convert_axes`` should only
be set to ``False`` if you need to preserve string-like numbers (e.g. '1', '2') in an axes.

.. note::

Large integer values may be converted to dates if ``convert_dates=True`` and the data and / or column labels appear 'date-like'. The exact threshold depends on the ``date_unit`` specified.

.. warning::

When reading JSON data, automatic coercing into dtypes has some quirks:
Expand All @@ -1143,13 +1164,13 @@ Don't convert any data (but still convert axes and dates)

.. ipython:: python

pd.read_json('test.json',dtype=object).dtypes
pd.read_json('test.json', dtype=object).dtypes

Specify how I want to convert data

.. ipython:: python

pd.read_json('test.json',dtype={'A' : 'float32', 'bools' : 'int8'}).dtypes
pd.read_json('test.json', dtype={'A' : 'float32', 'bools' : 'int8'}).dtypes

I like my string indicies

Expand All @@ -1163,11 +1184,30 @@ I like my string indicies
si.columns
json = si.to_json()

sij = pd.read_json(json,convert_axes=False)
sij = pd.read_json(json, convert_axes=False)
sij
sij.index
sij.columns

My dates have been written in nanoseconds, so they need to be read back in
nanoseconds

.. ipython:: python

json = dfj2.to_json(date_unit='ns')

# Try to parse timestamps as millseconds -> Won't Work
dfju = pd.read_json(json, date_unit='ms')
dfju

# Let Pandas detect the correct precision
dfju = pd.read_json(json)
dfju

# Or specify that all timestamps are in nanoseconds
dfju = pd.read_json(json, date_unit='ns')
dfju

.. ipython:: python
:suppress:

Expand Down
8 changes: 8 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ pandas 0.13
- Added a more informative error message when plot arguments contain
overlapping color and style arguments (:issue:`4402`)
- Significant table writing performance improvements in ``HDFStore``
- JSON date serialisation now performed in low-level C code.
- ``Index.copy()`` and ``MultiIndex.copy()`` now accept keyword arguments to
change attributes (i.e., ``names``, ``levels``, ``labels``)
(:issue:`4039`)
Expand Down Expand Up @@ -94,6 +95,10 @@ pandas 0.13
- removed the ``warn`` argument from ``open``. Instead a ``PossibleDataLossError`` exception will
be raised if you try to use ``mode='w'`` with an OPEN file handle (:issue:`4367`)
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`)
- ``JSON``

- added ``date_unit`` parameter to specify resolution of timestamps. Options
are seconds, milliseconds, microseconds and nanoseconds. (:issue:`4362`, :issue:`4498`).

- ``Index`` and ``MultiIndex`` changes (:issue:`4039`):

Expand Down Expand Up @@ -134,6 +139,9 @@ pandas 0.13
local variable was undefined (:issue:`4381`)
- In ``to_json``, raise if a passed ``orient`` would cause loss of data because
of a duplicate index (:issue:`4359`)
- In ``to_json``, fix date handling so milliseconds are the default timestamp
as the docstring says (:issue:`4362`).
- JSON NaT handling fixed, NaTs are now serialised to `null` (:issue:`4498`)
- Fixed passing ``keep_default_na=False`` when ``na_values=None`` (:issue:`4318`)
- Fixed bug with ``values`` raising an error on a DataFrame with duplicate columns and mixed
dtypes, surfaced in (:issue:`4377`)
Expand Down
19 changes: 14 additions & 5 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -535,7 +535,7 @@ def to_clipboard(self):
clipboard.to_clipboard(self)

def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
double_precision=10, force_ascii=True):
double_precision=10, force_ascii=True, date_unit='ms'):
"""
Convert the object to a JSON string.

Expand Down Expand Up @@ -566,11 +566,15 @@ def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
- columns : dict like {column -> {index -> value}}
- values : just the values array

date_format : type of date conversion (epoch = epoch milliseconds, iso = ISO8601)
default is epoch
date_format : string, default 'epoch'
type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601
double_precision : The number of decimal places to use when encoding
floating point values, default 10.
force_ascii : force encoded string to be ASCII, default True.
date_unit : string, default 'ms' (milliseconds)
The time unit to encode to, governs timestamp and ISO8601
precision. One of 's', 'ms', 'us', 'ns' for second, millisecond,
microsecond, and nanosecond respectively.

Returns
-------
Expand All @@ -580,8 +584,13 @@ def to_json(self, path_or_buf=None, orient=None, date_format='epoch',
"""

from pandas.io import json
return json.to_json(path_or_buf=path_or_buf, obj=self, orient=orient, date_format=date_format,
double_precision=double_precision, force_ascii=force_ascii)
return json.to_json(
path_or_buf=path_or_buf,
obj=self, orient=orient,
date_format=date_format,
double_precision=double_precision,
force_ascii=force_ascii,
date_unit=date_unit)

# install the indexerse
for _name, _indexer in indexing.get_indexers_list():
Expand Down
Loading