Skip to content

[git-object] Encode empty tags like git does #604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 18, 2022

Conversation

pierrechevalier83
Copy link
Contributor

@pierrechevalier83 pierrechevalier83 commented Nov 17, 2022

A tag with an empty commit message is now displayed with an extra newline, consitently with the output of git cat-file.

The test fixture for empty.txt was updated, which means that the round_trip test now expects a correct round trip with the new format.

Note that the changes to tests in immutable reflect the new state of the code:

  • TagRefIter now parses this body with an empty message.
  • Where the length of the data was divided by 2, the point was to create an inconsistent dataset leading to an error. Unfortunately, with the extra character, it lines up so that the message can be parsed, so I had to change the magic number to keep the previous property.

Fixes #603


Ok for Byron review the PR on video?

  • I give my permission to record review and upload on YouTube publicly

If I think the review will be helpful for the community, then I might record and publish a video.

A tag with an empty commit message is now displayed with an extra
newline, consitently with the output of `git cat-file`.

The test fixture for `empty.txt` was updated, which means that the
`round_trip` test now expects a correct round trip with the new format.

Note that the changes to tests in `immutable` reflect the new state of
the code:
* `TagRefIter` now parses this body with an empty message.
* Where the length of the data was divided by 2, the point was to create
  an inconsistent dataset leading to an error. Unfortunately, with the
  extra character, it lines up so that the message can be parsed, so I
  had to change the magic number to keep the previous property.

Fixes GitoxideLabs#603
@Byron Byron merged commit f0dfa4c into GitoxideLabs:main Nov 18, 2022
@Byron
Copy link
Member

Byron commented Nov 18, 2022

Thanks a lot for your contribution, it's much appreciated.

A review-video can be found here: https://youtu.be/86OA4MrYaL4

pierrechevalier83 added a commit to pierrechevalier83/git-object that referenced this pull request Nov 21, 2022
facebook-github-bot pushed a commit to facebook/sapling that referenced this pull request Nov 22, 2022
Summary:
Vendor a patch to `git-object` that fixes encoding of an empty git commit tag.

I [reported](GitoxideLabs/gitoxide#603) and [fixed](GitoxideLabs/gitoxide#604) this issue upstream, but the fix is only in the `main` branch for now and not yet in a release.

Reviewed By: mitrandir77

Differential Revision: D41401957

fbshipit-source-id: 27933dd6fe852f93824c4d3f4ca9c9a96d6d2faa
pierrechevalier83 pushed a commit to pierrechevalier83/gitoxide that referenced this pull request Mar 21, 2025
When representing an annotated tag with an empty commit message, we used
to only support a tag ending with two newlines (one after the tagger
line + one after the empty commit message).

This was due to a misconception that annotated tags ending with a single
NL shouldn't exist in the wild since there isn't an obvious way to
create them from with `git`.
This misconception shows up in the discussion thread for
[issue 603](GitoxideLabs#603)

It turns out that both encodings of empty annotated tags appear in the
wild.

We must be able to parse either and roundtrip for either.

Before [PR 604](GitoxideLabs#604), we
used to special case the empty tag msg case and not add a NL.
To be able to represent `b""`, `b"\n"`, `b"\n\n"`..., we special case
any `message` that is a pure sequence of `b'\n'` and actually parse the
`b'\n'` into the tag's message.

This allows us to calculate the correct size that matches the number of
bytes that git would produce, as well as round-trip to and from the
commit encoding.

The existing tests (in particular `round_trip` for `empty.txt` and
`empty_missing_nl.txt`) convince me that the logic is sound.
Also, the size being `139` bytes for `empty_missing_nl.txt` and `140`
bytes for `empty.txt` matches the output of `cat file | wc -l` for each
file.
Byron pushed a commit to pierrechevalier83/gitoxide that referenced this pull request Mar 22, 2025
When representing an annotated tag with an empty commit message, we used
to only support a tag ending with two newlines (one after the tagger
line + one after the empty commit message).

This was due to a misconception that annotated tags ending with a single
NL shouldn't exist in the wild since there isn't an obvious way to
create them from with `git`.
This misconception shows up in the discussion thread for
[issue 603](GitoxideLabs#603)

It turns out that both encodings of empty annotated tags appear in the
wild.

We must be able to parse either and roundtrip for either.

Before [PR 604](GitoxideLabs#604), we
used to special case the empty tag msg case and not add a NL.
To be able to represent `b""`, `b"\n"`, `b"\n\n"`..., we special case
any `message` that is a pure sequence of `b'\n'` and actually parse the
`b'\n'` into the tag's message.

This allows us to calculate the correct size that matches the number of
bytes that git would produce, as well as round-trip to and from the
commit encoding.

The existing tests (in particular `round_trip` for `empty.txt` and
`empty_missing_nl.txt`) convince me that the logic is sound.
Also, the size being `139` bytes for `empty_missing_nl.txt` and `140`
bytes for `empty.txt` matches the output of `cat file | wc -l` for each
file.
pierrechevalier83 added a commit to pierrechevalier83/gitoxide that referenced this pull request Mar 22, 2025
When representing an annotated tag with an empty commit message, we used
to only support a tag ending with two newlines (one after the tagger
line + one after the empty commit message).

This was due to a misconception that annotated tags ending with a single
NL shouldn't exist in the wild since there isn't an obvious way to
create them from with `git`.
This misconception shows up in the discussion thread for
[issue 603](GitoxideLabs#603)

It turns out that both encodings of empty annotated tags appear in the
wild.

We must be able to parse either and roundtrip for either.

Before [PR 604](GitoxideLabs#604), we
used to special case the empty tag msg case and not add a NL.
To be able to represent `b""`, `b"\n"`, `b"\n\n"`..., we special case
any `message` that is a pure sequence of `b'\n'` and actually parse the
`b'\n'` into the tag's message.

This allows us to calculate the correct size that matches the number of
bytes that git would produce, as well as round-trip to and from the
commit encoding.

The existing tests (in particular `round_trip` for `empty.txt` and
`empty_missing_nl.txt`) convince me that the logic is sound.
Also, the size being `139` bytes for `empty_missing_nl.txt` and `140`
bytes for `empty.txt` matches the output of `cat file | wc -l` for each
file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[git-object] Encoding of an empty tag seems inconsistent with git's
2 participants