Skip to content

Pandas.read_html missing converted data #15366

Closed
@nooperpudd

Description

@nooperpudd

pandas version:
'0.19.2'

import requests
url ="http://www.hkex.com.hk/chi/market/sec_tradinfo/stockcode/eisdeqty_c.htm"
response = requests.get(url)
if response.status_code == 200:
        soup = BeautifulSoup(response.content, "lxml")
        table = soup.find("table", class_="table_grey_border")
        board_data = pandas.read_html(table.prettify(),header=0, flavor="bs4")
        
        return board_data[0]

Problem description

       股份代號             股份名稱   買賣單位   附註 Unnamed: 4 Unnamed: 5 Unnamed: 6
0         1               長和    500    #          H          O          F
1         2             中電控股    500    #          H          O          F
2         3           香港中華煤氣   1000    #          H          O          F
3         4            九龍倉集團   1000    #          H          O          F
4         5             匯豐控股    400    #          H          O          F
5         6             電能實業    500    #          H          O          F
6         7             凱富能源   2000    #        NaN        NaN        NaN

股份代號 this column data should be 1->00001, 2->00002

datatype:
股份代號 int64
股份名稱 object
買賣單位 int64
附註 object
Unnamed: 4 object
Unnamed: 5 object
Unnamed: 6 object
dtype: object
<class 'pandas.core.frame.DataFrame'>

why missing the 0000 data in the columns

actually, the 股份代號 datatype should be object.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO HTMLread_html, to_html, Styler.apply, Styler.applymapUsage Question

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions