read_excel opimize nrows

#### Code Sample

```python
pd.read_excel(fname, nrows=10)
```
#### Problem description

Pandas has option toread only several rows of excel files.
But now it always read all rows and after pandas cut some part. 
For example, file have 100 columns and 50k rows, but for test need only first 10 rows.
Now pandas will read to list all 50k rows which use memory and take too many time to read.

les this should explain **why** the current behaviour is a problem and why the expected output is a better solution.]

#### Expected Output

Better solution should be read only rows which need for operation.

as I understand there shoul be some changes

pandas/io/excel/_base.py
```python
    @abc.abstractmethod
    def get_sheet_data(self, sheet, convert_float, header, skiprows, nrows):
        pass
```
pandas/io/excel/_base.py
```python
   data = self.get_sheet_data(sheet, convert_float, header, skiprows, nrows)
```

and in files _openpyxl.py, _odfreader.py, _xlrd.py
there should be something like
```python

    def get_sheet_data(self, sheet, convert_float: bool, header: int, skiprows: int, nrows: int) -> List[List[Scalar]]:
        data = []  # type: List[List[Scalar]]
        skiprows = 0 if skiprows is None else skiprows
        header = 0 if header is None else header

        for row in sheet.rows:
            if nrows is not None:
                if header > 0:
                    #print("skip lines before header")
                    header -= 1
                    data.append(["", ""])
                    continue
    
                if skiprows > 0:
                    #print("skip skiprows after header")
                    skiprows -= 1
                    data.append(["", ""])
                    continue
    
                if nrows >= 0:
                    #print("read nrows")
                    nrows -= 1
                else:
                    break
            data.append([self._convert_cell(cell, convert_float) for cell in row])

        return data
```

With this changes read_excel with engine='openpyxl' takes only 5 seconds instead of 50 seconds of current version. And if file will contain 1kk rows, it will take always around 5 seconds, but current version will take tens of minutes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_excel opimize nrows #32727

Code Sample

Problem description

Expected Output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_excel opimize nrows #32727

Description

Code Sample

Problem description

Expected Output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions