-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Jsonlines append mode #35832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jsonlines append mode #35832
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like an API change. It might be a good idea to open an issue for this and have the discussion there about desirability of the change you're proposing
if people do decide that it should go in we'll need tests added here for the new behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests for this?
@@ -2063,6 +2063,7 @@ def to_json( | |||
index: bool_t = True, | |||
indent: Optional[int] = None, | |||
storage_options: StorageOptions = None, | |||
mode: str = "w", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure about mode
as a keyword - maybe append_lines
is better?
I think mode is just confusing since it only works on a very small subset of user cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can make this change. My choice of using mode was to keep same naming as previously established (e.g. CSVFormatter(...,mode=mode...) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mode is appropriate and consistent with other IO methods., but possibly restrict to w|a only, would put before storage_options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'm just wary of what users expect from this since "mode" is not going to be applicable to the majority of orients. For example, it wouldn't make sense to have mode="a"
for the table orient because it would produce an invalid document structure that way
In any case can start with this and add tests for the behavior we want to catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll keep mode for now, If in future bigger consensus tell me to change I still can change. From my side, I tend to favor consistency, because of this mode="a"
sounds better to me.
@@ -2063,6 +2063,7 @@ def to_json( | |||
index: bool_t = True, | |||
indent: Optional[int] = None, | |||
storage_options: StorageOptions = None, | |||
mode: str = "w", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mode is appropriate and consistent with other IO methods., but possibly restrict to w|a only, would put before storage_options.
@@ -2156,6 +2157,13 @@ def to_json( | |||
|
|||
.. versionadded:: 1.2.0 | |||
|
|||
mode : str, default 'w' | |||
If 'orient' is 'records' and 'lines' is 'True' enable option to append | |||
mode ('mode' is 'a') to a json file instead of overwriting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests if anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will do It
@@ -95,9 +101,19 @@ def to_json( | |||
|
|||
if lines: | |||
s = convert_to_line_delimits(s) | |||
try: | |||
add_new_line = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm, what is all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that the file exists and is not empty before adding a new line at the end, so if the file is empty the code skips It. The try part is because errors can raise during the comparison (I copied It from pandas repo Itself, later I try to find the line)
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
I'm moving to another country and It's been hard to find time to work on the proposed changes but please don't close this PR, I'll update as soon as I find time to work on It again. |
No problem! We'll keep this open |
@arw2019 is there a reference on proper way of adding tests on pandas? |
Not really. Take a look at similar tests in the
If you push an attempt we'll give you detailed pointers |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Sorry, I'm not having time to work on this and probably will not be able to do for a while. I'll close the PR to give the chance of other people to work on this. Anyone feel free to fork my repo and continue |
Description: Adding support to append mode for
to_json
when 'orient' isrecords
and 'lines' isTrue
Motivation: jsonlines is a format that is gaining some space (e.g. Cloud Storage). In pandas It's achieved using
to_json
withorient='records'
andlines=True
. Butto_json
doesn't have the option to usemode='a'
to append to a file, this PR aims to provide a simple append mode for jsonlines using pandas.append mode
to JSON lines #35849black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff