Skip to content

PERF: to_json very slow with lines=True #14408

Closed
@joshowen

Description

@joshowen

A small, complete example of the issue

N = 100000
C = 5

In [6]: df = DataFrame(dict([('float{0}'.format(i), np.random.randn(N)) for i in range(C)]))

In [7]: df.to_json('foo.json',orient='records',lines=True)

In [8]: %timeit df.to_json('foo.json',orient='records',lines=True)
1 loop, best of 3: 3.66 s per loop

In [9]: %timeit df.to_json('foo.json',orient='records')
10 loops, best of 3: 98.8 ms per loop

As discussed in #14391

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO JSONread_json, to_json, json_normalizePerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions