BUG: read_excel(engine='openpyxl') with "unsized" XLSX issue

- [x] I have checked that this issue has not already been reported.

- [x] I have confirmed this bug exists on the latest version of pandas.

- [ ] (optional) I have confirmed this bug exists on the master branch of pandas.

---

#### Problem description

Consider code:
```python
import pandas as pd
pd.read_excel('test.xlsx', index_col=[0, 1], engine='openpyxl')
```
with some special `test.xlsx` (see below).

According to [openpyxl issue #1483](https://bitbucket.org/openpyxl/openpyxl/issues/1483) and [openpyxl documentation](https://openpyxl.readthedocs.io/en/stable/optimized.html#worksheet-dimensions), **user (i.e. Pandas) [have to](https://bitbucket.org/openpyxl/openpyxl/issues/1483#comment-57780692) call [`sheet.calculate_dimension(force=True)`](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/_read_only.py#lines-133) if any workbook's worksheet is "unsized"**.

(What is "worksheet is unsized"? It means: the worksheet doesn't have `DIMENSION_TAG`, see [WorkSheetParser.parse_dimensions()](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/_reader.py#lines-158). When it checked? On `ReadOnlyWorksheet` object construction.)

**But Pandas doesn't do it.**

So, if the worksheet is "unsized", on reading it by `read_excel(engine='openpyxl')` we try to [get sheet data](https://github.com/pandas-dev/pandas/blob/v1.0.3/pandas/io/excel/_base.py#L443). We  [iterate rows](https://github.com/pandas-dev/pandas/blob/v1.0.3/pandas/io/excel/_openpyxl.py#L539) falling down into openpyxl: [1](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/_read_only.py#lines-29) -> [2](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/worksheet.py#lines-454) -> [3](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/_read_only.py#lines-27) -> [4](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/worksheet.py#lines-401) (with `max_col == None`) -> [5](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/worksheet.py#lines-433) -> [6](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/_read_only.py#lines-185) (it returns `self._max_column == None`) -> [7](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/worksheet.py#lines-436) -> [8](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/_read_only.py#lines-57) (with `max_col == None`) -> [9](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/_read_only.py#lines-89) ->  [10](https://bitbucket.org/openpyxl/openpyxl/src/3.0.3/openpyxl/worksheet/_read_only.py#lines-10) (with `max_col == None`). And in [these lines](https://bitbucket.org/openpyxl/openpyxl/src/ca7b1baf75f2fc6b270320ea91d82404f5039e1e/openpyxl/worksheet/_read_only.py#lines-107:108):
```python
max_col = max_col or  row[-1]['column']
row_width = max_col + 1 - min_col
```
we have: **row's cells number (`row_width`) is individual for each row (`row`)**.

Now consider an _unsized_ Excel table (I have a real file but it's private; after saving in LibreOffice it's become sized):
![image](https://user-images.githubusercontent.com/1063219/84760507-ce742d00-afd0-11ea-8caf-45031ba57a3b.png)
We get [sheet data](https://github.com/pandas-dev/pandas/blob/v1.0.3/pandas/io/excel/_base.py#L443):
```python
[ [`A1`],
  [ 'B1', 'B2' ] ]
```

But `index_col=[0, 1]` **suppose to have >=2 cols in each row**:
```
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    pd.read_excel('test.xlsx', index_col=[0, 1], engine='openpyxl')
  File "/home/sasha/miniconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 334, in read_excel
    **kwds,
  File "/home/sasha/miniconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 888, in parse
    **kwds,
  File "/home/sasha/miniconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py", line 480, in parse
    last = data[offset][col]
IndexError: list index out of range
```

**IMHO, patch could be:**
```diff
diff --git a/pandas/io/excel/_openpyxl.py b/pandas/io/excel/_openpyxl.py
index c4327316d..3efbf4abc 100644
--- a/pandas/io/excel/_openpyxl.py
+++ b/pandas/io/excel/_openpyxl.py
@@ -536,6 +536,9 @@ class _OpenpyxlReader(_BaseExcelReader):
 
     def get_sheet_data(self, sheet, convert_float: bool) -> List[List[Scalar]]:
         data: List[List[Scalar]] = []
+
+        sheet.calculate_dimension(force=True)
+
         for row in sheet.rows:
             data.append([self._convert_cell(cell, convert_float) for cell in row])
```
(But it may provide [some drawbacks](https://bitbucket.org/openpyxl/openpyxl/issues/1483#comment-57780692)).
With this patch, sheet data would be:
```python
[ [`A1`, ''],
  [ 'B1', 'B2' ] ]
```

#### Output of ``pd.show_versions()``

<details>

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.42-calculate
machine          : x86_64
processor        : Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
byteorder        : little
LC_ALL           : None
LANG             : ru_RU.utf8
LOCALE           : ru_RU.UTF-8

pandas           : 1.0.3
numpy            : 1.18.1
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 46.4.0.post20200518
Cython           : 0.29.17
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : 1.2.8
lxml.etree       : 4.5.0
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.13.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.5.0
matplotlib       : 3.1.3
numexpr          : None
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : None
xlsxwriter       : 1.2.8
numba            : None
</details>

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: read_excel(engine='openpyxl') with "unsized" XLSX issue #34821

Problem description

Output of `pd.show_versions()`

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: read_excel(engine='openpyxl') with "unsized" XLSX issue #34821

Description

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `pd.show_versions()`