nrows limit fails reading well formed csv files from Australian electricity market data

Reading Australian electricity market data files, read_csv reads past the nrows limit for certain nrows values and consequently fails.

These market data files are 4 csv files combined into a single csv file and so the file has multiple headers and variable field size across the rows. 

The first set of data is from rows 1-1442.

 Intent was to extract first set of data with nrows = 1442. 

Testing several arbitrary CSV files from this data source shows well formed CSV - 120 fields between rows 1 to 1442  (with a 10 field at row 0)

```
lines = [len(line.strip().split(',')) for i,line in enumerate(csvFile) if i < 1442]
s = pd.Series(lines)
print (s.value_counts())
```

returns
    120    1441
    10        1
    dtype: int64

Other python examples of reading the market data using csv module [work fine](https://github.com/hsenot/aemo-json/blob/master/script/extract-historic-public-prices.py)

In the reproducible example below, code works for nrows< 824, but fails on any value above it. 

Testing on arbitrary files suggests the 824 limit is variable - sometimes a few more rows, sometimes a few less rows. 

```
import requests, io, zipfile
import pandas as pd

url = 'http://www.nemweb.com.au/Reports/CURRENT/Public_Prices/PUBLIC_PRICES_201406290000_20140630040528.zip'

    # get the zip-archive
request = requests.get(url)

    # make the archive available as a byte-stream
zipdata = io.BytesIO()
zipdata.write(request.content)
thezipfile = zipfile.ZipFile(zipdata, mode='r')

    # there is only one csv file per arhive - read it into a Pandas DataFrame
fname = thezipfile.namelist()[0] 

with thezipfile.open(fname) as csvFile:

        #works for nrows < = 823
    df1 = pd.read_csv(csvFile, header=1, index_col=4, parse_dates=True, nrows=823)
    print(df1.head())

        #fails for n> 823
    df1 = pd.read_csv(csvFile, header=1, index_col=4, parse_dates=True, nrows=824)
    print(df1.head())
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

nrows limit fails reading well formed csv files from Australian electricity market data #7626

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

nrows limit fails reading well formed csv files from Australian electricity market data #7626

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions