DOC: HDF5 External Compatibility section examples don't work

#### Location of the documentation

https://pandas.pydata.org/docs/user_guide/io.html#external-compatibility.

#### Documentation problem

This section probably has at least one typo, but more generally, doesn't seem to be documenting current behaviour.

I'll quickly run through the example here, but with a bit of cleaning so we don't have to run the entire page.

```python
import pandas as pd
import numpy as np

df_for_r = pd.DataFrame({"first": np.random.rand(100),
                         "second": np.random.rand(100),
                         "class": np.random.randint(0, 2, (100, ))},
                        index=range(100))


store_export = pd.HDFStore('export.h5')

# In the documentation, this is written with 'data_columns=df_dc.columns', which I'm assuming is a mistake
store_export.append('df_for_r', df_for_r, data_columns=df_for_r.columns)

store_export
```

We can take a look at what's in this file:

```python
store_export.close()
!h5ls -r export.h5
```

<details>
<summary> Output </summary>

```
/                        Group
/df_for_r                Group
/df_for_r/_i_table       Group
/df_for_r/_i_table/class Group
/df_for_r/_i_table/class/abounds Dataset {0/Inf}
/df_for_r/_i_table/class/bounds Dataset {0/Inf, 127}
/df_for_r/_i_table/class/indices Dataset {0/Inf, 131072}
/df_for_r/_i_table/class/indicesLR Dataset {131072}
/df_for_r/_i_table/class/mbounds Dataset {0/Inf}
/df_for_r/_i_table/class/mranges Dataset {0/Inf}
/df_for_r/_i_table/class/ranges Dataset {0/Inf, 2}
/df_for_r/_i_table/class/sorted Dataset {0/Inf, 131072}
/df_for_r/_i_table/class/sortedLR Dataset {131201}
/df_for_r/_i_table/class/zbounds Dataset {0/Inf}
/df_for_r/_i_table/first Group
/df_for_r/_i_table/first/abounds Dataset {0/Inf}
/df_for_r/_i_table/first/bounds Dataset {0/Inf, 127}
/df_for_r/_i_table/first/indices Dataset {0/Inf, 131072}
/df_for_r/_i_table/first/indicesLR Dataset {131072}
/df_for_r/_i_table/first/mbounds Dataset {0/Inf}
/df_for_r/_i_table/first/mranges Dataset {0/Inf}
/df_for_r/_i_table/first/ranges Dataset {0/Inf, 2}
/df_for_r/_i_table/first/sorted Dataset {0/Inf, 131072}
/df_for_r/_i_table/first/sortedLR Dataset {131201}
/df_for_r/_i_table/first/zbounds Dataset {0/Inf}
/df_for_r/_i_table/index Group
/df_for_r/_i_table/index/abounds Dataset {0/Inf}
/df_for_r/_i_table/index/bounds Dataset {0/Inf, 127}
/df_for_r/_i_table/index/indices Dataset {0/Inf, 131072}
/df_for_r/_i_table/index/indicesLR Dataset {131072}
/df_for_r/_i_table/index/mbounds Dataset {0/Inf}
/df_for_r/_i_table/index/mranges Dataset {0/Inf}
/df_for_r/_i_table/index/ranges Dataset {0/Inf, 2}
/df_for_r/_i_table/index/sorted Dataset {0/Inf, 131072}
/df_for_r/_i_table/index/sortedLR Dataset {131201}
/df_for_r/_i_table/index/zbounds Dataset {0/Inf}
/df_for_r/_i_table/second Group
/df_for_r/_i_table/second/abounds Dataset {0/Inf}
/df_for_r/_i_table/second/bounds Dataset {0/Inf, 127}
/df_for_r/_i_table/second/indices Dataset {0/Inf, 131072}
/df_for_r/_i_table/second/indicesLR Dataset {131072}
/df_for_r/_i_table/second/mbounds Dataset {0/Inf}
/df_for_r/_i_table/second/mranges Dataset {0/Inf}
/df_for_r/_i_table/second/ranges Dataset {0/Inf, 2}
/df_for_r/_i_table/second/sorted Dataset {0/Inf, 131072}
/df_for_r/_i_table/second/sortedLR Dataset {131201}
/df_for_r/_i_table/second/zbounds Dataset {0/Inf}
/df_for_r/table          Dataset {200/Inf}
```

</details>

Next, there is an R function for reading in this data. Just from comparing the given function to the written file I think we can see there is a mismatch:

```R
library(rhdf5)

loadhdf5data <- function(h5File) {

listing <- h5ls(h5File)
# Find all data nodes, values are stored in *_values and corresponding column
# titles in *_items
data_nodes <- grep("_values", listing$name)
name_nodes <- grep("_items", listing$name)
data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/")
name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/")
columns = list()
for (idx in seq(data_paths)) {
  # NOTE: matrices returned by h5read have to be transposed to obtain
  # required Fortran order!
  data <- data.frame(t(h5read(h5File, data_paths[idx])))
  names <- t(h5read(h5File, name_paths[idx]))
  entry <- data.frame(data)
  colnames(entry) <- names
  columns <- append(columns, entry)
}

data <- data.frame(columns)

return(data)
}
```

For example, there are no entries in `export.h5` which have `_values` or `_items` in the names.

If we actually call this function, we get an empty dataframe back:

```R
> loadhdf5data("export.h5")
data frame with 0 columns and 0 rows
```

This function does seem to work if the file is written using "fixed" format

```python
df_for_r.to_hdf("export2.h5", key="df_for_r", format="fixed")  
```

```R
> loadhdf5data("export2.h5")
          first      second class
1   0.675013759 0.787289926     0
2   0.936797348 0.349671699     1
3   0.951930811 0.275965069     0
4   0.203085530 0.380154180     0
5   0.627195223 0.462702969     1
6   0.129148756 0.385663581     1
...
```

This is a bit contrary to the prose for this section which reads:

> HDFStore writes table format objects in specific formats suitable for producing loss-less round trips to pandas objects. For external compatibility, HDFStore can read native PyTables format tables.
>
> It is possible to write an HDFStore object that can easily be imported into R using the rhdf5 library (Package website). Create a table format store like this:


#### Suggested fix for documentation

This should probably specify that the "table" format doesn't work here. In addition, since external compatibility relies on the user writing code to read this format, maybe a specification for the format should be documented here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: HDF5 External Compatibility section examples don't work #35419

Location of the documentation

Documentation problem

Suggested fix for documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DOC: HDF5 External Compatibility section examples don't work #35419

Description

Location of the documentation

Documentation problem

Suggested fix for documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions