Skip to content

DOC: Added DataFrame in Parameters & Return description in the docstring #44007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 17, 2021

Conversation

abatomunkuev
Copy link
Contributor

@abatomunkuev abatomunkuev commented Oct 13, 2021

Hello! I have added the following information to the source code (docstring of to_datetime method):

  1. In case when DataFrame is passed as a parameter, the to_datetime method expects the following column names (minimally "year", "month", "day").
  2. The method returns Series when the DataFrame is passed.

I have created Docker Image & Python Environment, then tested my docstring update by using validate_docstrings.py script.

Before testing my update, I have run the script on master branch. The script found 2 errors on master branch:

2 Errors found:
        No extended summary found
        flake8 error: E999 SyntaxError: invalid syntax

Then, I switched to my feature branch and run the validate_docstrings.py script.

Output of the script

################################################################################
######################## Docstring (pandas.to_datetime) ########################
################################################################################

Convert argument to datetime.

Parameters
----------
arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
    The object to convert to a datetime. If the DataFrame is provided, the method expects 
    minimally the following columns ( "year", "month", "day") in the DataFrame.
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If 'raise', then invalid parsing will raise an exception.
    - If 'coerce', then invalid parsing will be set as NaT.
    - If 'ignore', then invalid parsing will return the input.
dayfirst : bool, default False
    Specify a date parse order if `arg` is str or its list-likes.
    If True, parses dates with the day first, eg 10/11/12 is parsed as
    2012-11-10.

    .. warning::

        dayfirst=True is not strict, but will prefer to parse
        with day first. If a delimited date string cannot be parsed in
        accordance with the given `dayfirst` option, e.g.
        ``to_datetime(['31-12-2021'])``, then a warning will be shown.

yearfirst : bool, default False
    Specify a date parse order if `arg` is str or its list-likes.

    - If True parses dates with the year first, eg 10/11/12 is parsed as
      2010-11-12.
    - If both dayfirst and yearfirst are True, yearfirst is preceded (same
      as dateutil).

    .. warning::

        yearfirst=True is not strict, but will prefer to parse
        with year first.

utc : bool, default None
    Return UTC DatetimeIndex if True (converting any tz-aware
    datetime.datetime objects as well).
format : str, default None
    The strftime to parse time, eg "%d/%m/%Y", note that "%f" will parse
    all the way up to nanoseconds.
    See strftime documentation for more information on choices:
    https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.
exact : bool, True by default
    Behaves as:
    - If True, require an exact format match.
    - If False, allow the format to match anywhere in the target string.

unit : str, default 'ns'
    The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
    integer or float number. This will be based off the origin.
    Example, with unit='ms' and origin='unix' (the default), this
    would calculate the number of milliseconds to the unix epoch start.
infer_datetime_format : bool, default False
    If True and no `format` is given, attempt to infer the format of the
    datetime strings based on the first non-NaN element,
    and if it can be inferred, switch to a faster method of parsing them.
    In some cases this can increase the parsing speed by ~5-10x.
origin : scalar, default 'unix'
    Define the reference date. The numeric values would be parsed as number
    of units (defined by `unit`) since this reference date.

    - If 'unix' (or POSIX) time; origin is set to 1970-01-01.
    - If 'julian', unit must be 'D', and origin is set to beginning of
      Julian Calendar. Julian day number 0 is assigned to the day starting
      at noon on January 1, 4713 BC.
    - If Timestamp convertible, origin is set to Timestamp identified by
      origin.
cache : bool, default True
    If True, use a cache of unique, converted dates to apply the datetime
    conversion. May produce significant speed-up when parsing duplicate
    date strings, especially ones with timezone offsets. The cache is only
    used when there are at least 50 values. The presence of out-of-bounds
    values will render the cache unusable and may slow down parsing.

    .. versionchanged:: 0.25.0
        - changed default value from False to True.

Returns
-------
datetime
    If parsing succeeded.
    Return type depends on input:

    - list-like:
        - DatetimeIndex, if timezone naive or aware with the same timezone
        - Index of object dtype, if timezone aware with mixed time offsets
    - Series: Series of datetime64 dtype
    - DataFrame: Series of datetime64 dtype
    - scalar: Timestamp

    In case when it is not possible to return designated types (e.g. when
    any element of input is before Timestamp.min or after Timestamp.max)
    return will have datetime.datetime type (or corresponding
    array/Series).

See Also
--------
DataFrame.astype : Cast argument to a specified dtype.
to_timedelta : Convert argument to timedelta.
convert_dtypes : Convert dtypes.

Examples
--------
Assembling a datetime from multiple columns of a DataFrame. The keys can be
common abbreviations like ['year', 'month', 'day', 'minute', 'second',
'ms', 'us', 'ns']) or plurals of the same

>>> df = pd.DataFrame({'year': [2015, 2016],
...                    'month': [2, 3],
...                    'day': [4, 5]})
>>> pd.to_datetime(df)
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

If a date does not meet the `timestamp limitations
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
#timeseries-timestamp-limits>`_, passing errors='ignore'
will return the original input instead of raising any exception.

Passing errors='coerce' will force an out-of-bounds date to NaT,
in addition to forcing non-dates (or non-parseable dates) to NaT.

>>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
datetime.datetime(1300, 1, 1, 0, 0)
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
NaT

Passing infer_datetime_format=True can often-times speedup a parsing
if its not an ISO8601 format exactly, but in a regular format.

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
>>> s.head()
0    3/11/2000
1    3/12/2000
2    3/13/2000
3    3/11/2000
4    3/12/2000
dtype: object

>>> %timeit pd.to_datetime(s, infer_datetime_format=True)  # doctest: +SKIP
100 loops, best of 3: 10.4 ms per loop

>>> %timeit pd.to_datetime(s, infer_datetime_format=False)  # doctest: +SKIP
1 loop, best of 3: 471 ms per loop

Using a unix epoch time

>>> pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
>>> pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

.. warning:: For float arg, precision rounding might happen. To prevent
    unexpected behavior use a fixed-width exact type.

Using a non-unix epoch origin

>>> pd.to_datetime([1, 2, 3], unit='D',
...                origin=pd.Timestamp('1960-01-01'))
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
              dtype='datetime64[ns]', freq=None)

In case input is list-like and the elements of input are of mixed
timezones, return will have object type Index if utc=False.

>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'])
Index([2018-10-26 12:00:00-05:30, 2018-10-26 12:00:00-05:00], dtype='object')

>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
...                utc=True)
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

################################################################################
################################## Validation ##################################
################################################################################

2 Errors found:
        No extended summary found
        flake8 error: E999 SyntaxError: invalid syntax

There are still some errors from master branch.

@pep8speaks
Copy link

pep8speaks commented Oct 13, 2021

Hello @abatomunkuev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-17 02:10:15 UTC

The object to convert to a datetime.
The object to convert to a datetime. If the DataFrame is provided,the method
expects minimally the following columns: "year", "month", "day"
in the DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in the DataFrame is redundant from If the DataFrame in the first line.

@@ -692,7 +692,9 @@ def to_datetime(
Parameters
----------
arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
The object to convert to a datetime.
The object to convert to a datetime. If the DataFrame is provided,the method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: provided,the -> provided, the

@mroeschke mroeschke added the Docs label Oct 16, 2021
@mroeschke mroeschke added this to the 1.4 milestone Oct 17, 2021
@mroeschke mroeschke merged commit 9a29391 into pandas-dev:master Oct 17, 2021
@mroeschke
Copy link
Member

Thanks @abatomunkuev

@nielsbom
Copy link

Thanks @abatomunkuev !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: to_datetime works real differently on DataFrames vs other datatypes
4 participants