Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd # version 1.5.2
import io
data = """a;b;c\n0000.7995;16.000;0\n3.03.001.00514;0;4.000\n4923.600.041;23.000;131"""
df1 = pd.read_csv(io.StringIO(data), sep=';', dtype={'a': str}, thousands='.', engine='c')
df2 = pd.read_csv(io.StringIO(data), sep=';', dtype={'a': str}, thousands='.', engine='python')
Issue Description
Dots are stripped from strings that consist of numbers and dots, when engine='python' ('c' works fine), even when dtype is set explicitly.
The unexpected behaviour is experienced when processing a csv file that has strings that solely consist of numbers and single dots spread throughout the string the read_csv parameters are set: engine='python' and thousands='.'
The issue was initially filed on stackoverflow.
Expected Behavior
Dots are not stripped from the columns if str type is set up.
Installed Versions
INSTALLED VERSIONS
commit : 8dab54d
python : 3.9.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Russian_Russia.1252
pandas : 1.5.2
numpy : 1.22.1
pytz : 2022.1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : 2.9.3
jinja2 : 3.1.0
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : 1.0.9
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.4.3
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
snappy : None
sqlalchemy : 1.4.44
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : 2022.4