Skip to content

Jsonlines append mode #35832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2063,6 +2063,7 @@ def to_json(
index: bool_t = True,
indent: Optional[int] = None,
storage_options: StorageOptions = None,
mode: str = "w",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure about mode as a keyword - maybe append_lines is better?

I think mode is just confusing since it only works on a very small subset of user cases

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make this change. My choice of using mode was to keep same naming as previously established (e.g. CSVFormatter(...,mode=mode...) ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode is appropriate and consistent with other IO methods., but possibly restrict to w|a only, would put before storage_options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'm just wary of what users expect from this since "mode" is not going to be applicable to the majority of orients. For example, it wouldn't make sense to have mode="a" for the table orient because it would produce an invalid document structure that way

In any case can start with this and add tests for the behavior we want to catch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll keep mode for now, If in future bigger consensus tell me to change I still can change. From my side, I tend to favor consistency, because of this mode="a" sounds better to me.

) -> Optional[str]:
"""
Convert the object to a JSON string.
Expand Down Expand Up @@ -2156,6 +2157,13 @@ def to_json(

.. versionadded:: 1.2.0

mode : str, default 'w'
If 'orient' is 'records' and 'lines' is 'True' enable option to append
mode ('mode' is 'a') to a json file instead of overwriting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests if anything else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will do It

Will throw ValueError if incorrect 'orient' and 'lines'.

.. versionadded:: 1.2.0

Returns
-------
None or str
Expand Down Expand Up @@ -2335,6 +2343,7 @@ def to_json(
index=index,
indent=indent,
storage_options=storage_options,
mode=mode,
)

def to_hdf(
Expand Down
18 changes: 17 additions & 1 deletion pandas/io/json/_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ def to_json(
index: bool = True,
indent: int = 0,
storage_options: StorageOptions = None,
mode: str = "w",
):

if not index and orient not in ["split", "table"]:
Expand All @@ -68,6 +69,11 @@ def to_json(
if lines and orient != "records":
raise ValueError("'lines' keyword only valid when 'orient' is records")

if mode == "a" and (not lines or orient != "records"):
raise ValueError(
"'append mode' only valid when 'line' is True and 'orient' is records"
)

if orient == "table" and isinstance(obj, Series):
obj = obj.to_frame(name=obj.name or "values")

Expand Down Expand Up @@ -95,9 +101,19 @@ def to_json(

if lines:
s = convert_to_line_delimits(s)
try:
add_new_line = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm, what is all this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the file exists and is not empty before adding a new line at the end, so if the file is empty the code skips It. The try part is because errors can raise during the comparison (I copied It from pandas repo Itself, later I try to find the line)

mode == "a"
and os.path.exists(path_or_buf)
and os.path.isfile(path_or_buf)
and os.path.getsize(path_or_buf)
)
s = "\n" + s if add_new_line else s
except (TypeError, ValueError):
pass

if isinstance(path_or_buf, str):
fh, handles = get_handle(path_or_buf, "w", compression=compression)
fh, handles = get_handle(path_or_buf, mode, compression=compression)
try:
fh.write(s)
finally:
Expand Down