Skip to content

Jsonlines append mode #35832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

nullhack
Copy link

@nullhack nullhack commented Aug 20, 2020

Description: Adding support to append mode for to_json when 'orient' is records and 'lines' is True

Motivation: jsonlines is a format that is gaining some space (e.g. Cloud Storage). In pandas It's achieved using to_json with orient='records' and lines=True. But to_json doesn't have the option to use mode='a' to append to a file, this PR aims to provide a simple append mode for jsonlines using pandas.


Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like an API change. It might be a good idea to open an issue for this and have the discussion there about desirability of the change you're proposing

if people do decide that it should go in we'll need tests added here for the new behaviour

@gfyoung gfyoung added Enhancement IO JSON read_json, to_json, json_normalize labels Aug 23, 2020
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for this?

@@ -2063,6 +2063,7 @@ def to_json(
index: bool_t = True,
indent: Optional[int] = None,
storage_options: StorageOptions = None,
mode: str = "w",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure about mode as a keyword - maybe append_lines is better?

I think mode is just confusing since it only works on a very small subset of user cases

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make this change. My choice of using mode was to keep same naming as previously established (e.g. CSVFormatter(...,mode=mode...) ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode is appropriate and consistent with other IO methods., but possibly restrict to w|a only, would put before storage_options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'm just wary of what users expect from this since "mode" is not going to be applicable to the majority of orients. For example, it wouldn't make sense to have mode="a" for the table orient because it would produce an invalid document structure that way

In any case can start with this and add tests for the behavior we want to catch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll keep mode for now, If in future bigger consensus tell me to change I still can change. From my side, I tend to favor consistency, because of this mode="a" sounds better to me.

@@ -2063,6 +2063,7 @@ def to_json(
index: bool_t = True,
indent: Optional[int] = None,
storage_options: StorageOptions = None,
mode: str = "w",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode is appropriate and consistent with other IO methods., but possibly restrict to w|a only, would put before storage_options.

@@ -2156,6 +2157,13 @@ def to_json(

.. versionadded:: 1.2.0

mode : str, default 'w'
If 'orient' is 'records' and 'lines' is 'True' enable option to append
mode ('mode' is 'a') to a json file instead of overwriting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests if anything else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will do It

@@ -95,9 +101,19 @@ def to_json(

if lines:
s = convert_to_line_delimits(s)
try:
add_new_line = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm, what is all this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the file exists and is not empty before adding a new line at the end, so if the file is empty the code skips It. The try part is because errors can raise during the comparison (I copied It from pandas repo Itself, later I try to find the line)

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 28, 2020
@nullhack
Copy link
Author

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

I'm moving to another country and It's been hard to find time to work on the proposed changes but please don't close this PR, I'll update as soon as I find time to work on It again.

@arw2019 arw2019 removed the Stale label Nov 4, 2020
@arw2019
Copy link
Member

arw2019 commented Nov 4, 2020

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

I'm moving to another country and It's been hard to find time to work on the proposed changes but please don't close this PR, I'll update as soon as I find time to work on It again.

No problem! We'll keep this open

@nullhack
Copy link
Author

@arw2019 is there a reference on proper way of adding tests on pandas?

@arw2019
Copy link
Member

arw2019 commented Nov 30, 2020

@arw2019 is there a reference on proper way of adding tests on pandas?

Not really. Take a look at similar tests in the pandas/tests/io/json for a general idea of the format. Typically the pattern we use is:

  • construct result
  • construct expected result
  • assert the two are equal

If you push an attempt we'll give you detailed pointers

@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jan 1, 2021
@nullhack
Copy link
Author

nullhack commented Jan 1, 2021

Sorry, I'm not having time to work on this and probably will not be able to do for a while. I'll close the PR to give the chance of other people to work on this. Anyone feel free to fork my repo and continue

@nullhack nullhack closed this Jan 1, 2021
@arw2019 arw2019 added Mothballed Temporarily-closed PR the author plans to return to and removed Stale labels Jan 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize Mothballed Temporarily-closed PR the author plans to return to
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Enable append mode to JSON lines
5 participants