Skip to content

Conversion of DataFrame to R's data.frame #350

Closed
@lbeltrame

Description

@lbeltrame

Although I have already produced code for this (see below) I'm posting this as an issue rather than a pull request to discuss the design, because there are some issues open in my code:

  • Series of dtype object need an explicit cast or rpy2's numpy conversion will treat them improperly
  • The performance has not been profiled
  • Probably some room for optimizations
  • Proper name for the function
  • The generation of an intermediate OrdDict object may cause problems in case of very large datasets

The code in the current form is posted below. If there is interest, I will work towards integrating it in pandas.rpy.common and add unit tests.

import numpy as np
import rpy2.robjects as robjects
import rpy2.robjects.numpy2ri as numpy2ri
from rpy2.robjects.packages import importr
import rpy2.rlike.container as rlc

def dataset_to_data_frame(dataset, strings_as_factors=True):

    # Activate conversion for numpy objects
    robjects.conversion.py2ri = numpy2ri.numpy2ri
    robjects.numpy2ri.activate()

    base = importr("base")
    columns = rlc.OrdDict()

    for column in dataset:
        value = dataset[column]

        # object type requires explicit cast
        if value.dtype == np.object:
            value = robjects.StrVector(value)
             #FIXME: how to generalize it?
            if not strings_as_factors:
                value = base.I(value)

        columns[column] = value

    dataframe = robjects.DataFrame(columns)
    dataframe.rownames = robjects.StrVector(dataset.index)

    # To prevent side-effects in other code
    robjects.conversion.pi2ri = robjects.default_py2ri

    return dataframe

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions