Skip to content

CLN remove duplicated entries in valid_resos in pandas/core/indexes/datetimes.py #38121

Closed
@boringow

Description

@boringow
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Before running, it is easy! just check the lines 560-570 of the pandas/code/indexes/datetime.py and you will understand the human error some developer did :)

# Your code here



import pandas as pd
import datetime as dt

Inside Github.csv, we have:

'''
,RxID,msgtype,message,confirmed,SNR,strength,format,icao24,correctedBits,LowConf,freqOffst
2020-08-25 11:00:08.187503455,1,10,4000,,0.0,-80.5,,,,,
2020-08-25 11:00:08.189013753,1,10,4000,,0.0,-78.5,,,,,
2020-08-25 11:00:08.189746310,1,8,,,0.0,-79.0,,,,,
2020-08-25 11:00:08.189986916,1,10,2700,,0.0,-78.0,,,,,
2020-08-25 11:00:08.190476779,1,6,20A93040502C94,1,0.0,-79.5,4.0,A1F014,,,
2020-08-25 11:00:08.190482661,1,9,0000,,0.0,-79.0,,,,,
2020-08-25 11:00:08.190963454,1,6,58EB0000171D17,1,0.0,-76.5,11.0,,,,
2020-08-25 11:00:08.191085134,1,9,7F00,,0.0,-78.5,,,,,
2020-08-25 11:00:08.191087092,1,9,0000,,0.0,-79.5,,,,,
2020-08-25 11:00:08.191123383,1,9,1000,,0.0,-72.0,,,,,
2020-08-25 11:00:08.191139020,1,7,,,0.0,-77.0,,,,,
2020-08-25 11:00:08.191150695,1,7,,,0.0,-76.5,,,,,
2020-08-25 11:00:08.191590978,1,10,0000,,0.0,-70.5,,,,,
2020-08-25 11:00:08.193015479,1,10,0000,,0.0,-83.0,,,,,
2020-08-25 11:00:08.193509041,1,9,2800,,0.0,-78.0,,,,,
2020-08-25 11:00:08.193664650,1,8,,,0.0,-80.5,,,,,
2020-08-25 11:00:08.193992571,1,10,0000,,0.0,-81.5,,,,,
2020-08-25 11:00:08.194459459,1,10,7D00,,0.0,-76.5,,,,,
2020-08-25 11:00:08.194461492,1,10,0001,,0.0,-76.0,,,,,
2020-08-25 11:00:08.195045194,1,10,0000,,0.0,-80.0,,,,,
2020-08-25 11:00:08.195061385,1,8,,,0.0,-80.0,,,,,
2020-08-25 11:00:08.195102628,1,10,0000,,0.0,-75.0,,,,,

'''
dfPKT = pd.read_csv('Github.csv',dtype={'RxId': int, 'msgtype': int, 'message': str, 'confirmed': object,'SNR': float, 'strength': float, 'format': float, 'icao24': str, 'correctedBits': float,'LowConf': float, 'freqOffst': float},index_col=0)
i=pd.DatetimeIndex(dfPKT.index,dayfirst=True)
dfPKT.set_index(i,inplace=True)

#occupancy constant
occupancyMode3Ainterrogation = 0.00015675

#How many messages are inside valid Mode A interrogation.
validModeAint=[]

#We start with the first timestamp as a reference.
Next_timestamp = dfPKT.index[0]
#Then, we get all the minutes of our study/analysis.
allminutes = dfPKT.index.floor('60S').unique()
#and for every minute, we do a data analysis.
for minute in allminutes:
    #Here, since the dataset is very big, I get small datasets of 1 minute each. 
    dfAnalysis = dfPKT.loc[(dfPKT.index > minute) & (dfPKT.index < minute + dt.timedelta(seconds=60))]
    for message, index, format, code, msgtype in zip(dfAnalysis["message"], dfAnalysis.index, dfAnalysis["format"],dfAnalysis["icao24"], dfAnalysis["msgtype"]):
    #Depending on the type of the message, we process it in one way or another
        if msgtype == 10:
            bits = bin(int(message, 16))[2:].zfill(16) #this is the 16 bits inside the messagetype 10. 
            if bits[:2] == '00': #if first two bits are zero
                '''
                when these two specific bits on the message are 0, we have to check on the same dataset if there is a message linked to it, from 8 to 13 microseconds after the actual message. HERE is were we get the error. It is trying to slice the datetime strings, and internally, it does not have a "milisecond" resolution. Altough my resolution in here is microseconds. But internally, it tries to do something with miliseconds and It breaks the code.
                '''
                if (2 in dfAnalysis[str(index + dt.timedelta(seconds=0.000008)):str(index + dt.timedelta(seconds=0.000013))]["msgtype"].to_numpy()):
                    validModeAint.append(index)
                    Next_timestamp = index + dt.timedelta(seconds=occupancyMode3Ainterrogation)

Problem description

Traceback (most recent call last):
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 718, in slice_indexer
    return Index.slice_indexer(self, start, end, step, kind=kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\base.py", line 4966, in slice_indexer
    start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\base.py", line 5169, in slice_locs
    start_slice = self.get_slice_bound(start, "left", kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\base.py", line 5079, in get_slice_bound
    label = self._maybe_cast_slice_bound(label, side, kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 665, in _maybe_cast_slice_bound
    lower, upper = self._parsed_string_to_bounds(reso, parsed)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 536, in _parsed_string_to_bounds
    raise KeyError
KeyError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\OccupancyScripts\OccupancyToKML\Github.py", line 31, in <module>
    if (2 in dfAnalysis[str(index + dt.timedelta(seconds=0.000008)):str(index + dt.timedelta(seconds=0.000013))]["msgtype"].to_numpy()):
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\frame.py", line 2881, in __getitem__
    indexer = convert_to_index_sliceable(self, key)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexing.py", line 2132, in convert_to_index_sliceable
    return idx._convert_slice_indexer(key, kind="getitem")
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\base.py", line 3190, in _convert_slice_indexer
    indexer = self.slice_indexer(start, stop, step, kind=kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 728, in slice_indexer
    start_casted = self._maybe_cast_slice_bound(start, "left", kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 665, in _maybe_cast_slice_bound
    lower, upper = self._parsed_string_to_bounds(reso, parsed)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 536, in _parsed_string_to_bounds
    raise KeyError
KeyError

Process finished with exit code 1

The current behavior is that the indexes are partitioned, as they are strings, and the reso or Resolution class, which has nanosecond and minisecond resolution, then is passed to the datetime.py file on the pandas/core/indexes, which has the valid_resos (on line 520) wrong.

Expected Output

Process finished with exit code 0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d42
python : 3.9.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252
pandas : 1.1.4
numpy : 1.19.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions