Description
Code Sample
# code 1
def get_pid():
info = pd.read_csv('s3://vision.algo.data/cyz/tmp/test/11.csv', header=None, names=['pid'])
print(f'pid num : {len(info)}')
# code 2
def get_pid():
info = pd.read_csv('s3://vision.algo.data/cyz/tmp/test/11.csv', header=None, names=['pid'], memory_map=False)
print(f'pid num : {len(info)}')
while True:
get_pid()
time.sleep(4)
Problem description
when I use code1
, if s3 file changed, info
not change.
code2
can find the change.
The difference between these two code is parameter memory_map
, actually, default memory_map
is False, so I was confused.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 4.4.0-1098-aws
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 42.0.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : 0.4.0
scipy : 1.3.3
sqlalchemy : 1.3.11
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None