Skip to content

Writing multiple pack files for the same repository #1216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

AMSadek
Copy link
Contributor

@AMSadek AMSadek commented Oct 26, 2015

This pull request to have some initial thoughts on writing multiple pack files for the same repository.
Changes so far:
-Adding a test case to PackBuilderFixture for creating multiple pack files for the same repository.

@AMSadek AMSadek mentioned this pull request Oct 26, 2015
@AMSadek
Copy link
Contributor Author

AMSadek commented Oct 26, 2015

The InternalPack method in ObjectDatabase by nature can write multiple pack and index files with no problems. The reason is the PackDirectoryPath in PackBuilderOptions is just the directory path that will have all the pack and index files written to it.
Multiple calls to Pack method with different Action<PackBuilder> packDelegate will write different pack files. Also note that the pack file name is the SHA1 hash of it's contents, so no name conflicts will happen unless you write the same pack file twice.

cc/ @nulltoken @dahlbyk I hope this clarifies your concerns in #1183 (comment) and #1183 (comment)

In this simple test case, I'm writing all commits to a file, all trees to another file, and all blobs yet to a third file. And it works!, I tested it using git verify-pack -v command.

To do something like what's in #1183 (comment) maybe kinda tricky to have the same number of objects into each pack however it's doable.
After all, it's the not the API task, it will be mainly the user task to write smarter delegates to build his packfiles. However, I will think about it and add it to the PR for the sake of completeness :)

Please guys give me your suggestions :)

DirectoryHelper.DeleteDirectory(mrrRepoPackDirPath);
Directory.CreateDirectory(mrrRepoPackDirPath + "/pack");

PackBuilderOptions packBuilderOptions = new PackBuilderOptions(mrrRepoPackDirPath + "/pack");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need a new packBuilderOptions for each of the three files?

Func<int, PackBuilderOptions> packBuilderOptions = i => new PackBuilderOptions(mrrRepoPackDirPath + "/pack"+ i);
...
results = results = orgRepo.ObjectDatabase.Pack(packBuilderOptions(i), b =>

To that end, it's probably wise to add an assert that the expected pack files were created. It also occurs to me that just about every use of this API will require some sort of .git/objects manipulation - does libgit2 (or could we)
support a relative file path that default to living under .git/objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need a new packBuilderOptions for each of the three files?

Nope, not necessarily ... cause all the 6 files (3 .pack & 3 .idx) will be written to the same directory path. They will have different names cause they will have different content. but all the 6 files will be written to the pack directory. ... and that's the default format written and understood by git.exe

So indeed there's no reason to do such line new PackBuilderOptions(mrrRepoPackDirPath + "/pack"+ i); ... it's not even the default format the git.exe will expect when reading the newly written repository ... we will have to move all the .pack and .idx files to the .git/objects/pack at the end if we want to read it again by any git client.
However, it's still valid to write to different directories if you want.

it's probably wise to add an assert that the expected pack files were created.

Yup, I'll add a property in the PackBuilder to return the hash of the written file, and in the tests I will assert if it's written after the call.

It also occurs to me that just about every use of this API will require some sort of .git/objects manipulation - does libgit2 (or could we) support a relative file path that default to living under .git/objects?

Didn't fully understand this. Are you talking about the PackBuilderOptions property PackDirectoryPath to be defaulted to .git/objects/pack ? or more general issue in the entire API ?
If first, well, we thought of that. It's just I don't want to throw DirectoryNotFoundException if the default directory didn't exist ... so I will leave it to the API user to explicitly state the directory to write to ... also that will give the flexibility to write to multiple directories for the same repo if he wants to do so for some reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I hadn't connected that the path was a directory rather than individual files. I should know this, but I have young children.

It does seem like a potentially better API if we were to ensure the requested pack directory exists, and default to objects/pack within the current repo's .git directory. Just my $.02.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current repo's packfile dir is just one possible location (git itself dumps it on the current working directory). This is what we do on fetch, but if you're using this API you're working at a level that makes it more likely that you'd want to store them elsewhere to do your own thing. It's not some every-day code you're writing.

@AMSadek
Copy link
Contributor Author

AMSadek commented Oct 27, 2015

I've added one more test case on writing multiple pack files. This one is to write each 100 objects to a different file.
I've also added more testing and enhancement to PackBuilderFixture.
Exposed libgit2 method to get the pack file hash.
Minor fixes for PackBuilder and PackBuilderResults.

@AMSadek
Copy link
Contributor Author

AMSadek commented Oct 27, 2015

@nulltoken we can do more fun stuff on building multiple packs by exposing an Add method to the PackBuilder type that takes a CommitFilter ... some thing like public void AddAll(CommitFilter filter) ... that could by really handy and quite easy to implement if we can internally use its underlying RevWalkerSafeHandle ... like you suggested in #1183 (comment) :)

When we get this merged, I will work on CommitFilter support on a separate PR :)

@dahlbyk
Copy link
Member

dahlbyk commented Nov 12, 2015

Some multi-pack discussion started at #1213 (comment)

Exposing libgit2 method to get the pack file hash.
Minor fixes for PackBuilder, PackBuilderResults, and PackBuilderFixture.
@AMSadek AMSadek force-pushed the AMSadek/testCreatingMultiplePackFiles branch from e639edc to 6035a81 Compare November 21, 2015 10:22
@AMSadek AMSadek force-pushed the AMSadek/testCreatingMultiplePackFiles branch from 6035a81 to 629e2de Compare November 21, 2015 10:39
@AMSadek
Copy link
Contributor Author

AMSadek commented Nov 21, 2015

Hey guys @dahlbyk @carlosmn @nulltoken @kevin-david @whoisj as a followup on our discussion on the PR #1213
I've added a new commit modifying the design a bit and moving PackBuilder toLibGit2Sharp.Advanced namespace

I give up the idea of splitting the implementation between PackBuilder and PackDefinition because the handle disposal will be even more tricky, as they have to share the underlying packbuilder handle, also the idea of releasing the handle at the end of the write method might lead to another issue, as @whoisj mention, if the API user didn't write the Pack after building it.

This brings us to the question either to allow explicit writes or not, In this commit I allowed it. so the API user has to dispose the PackBuilder himself. I added a Reset() method to allow the user to reuse the same PackBuilder for multiple pack files, making the implementation of it slightly simpler.

The PackAll method in ODB is much simpler now, and only does packing all objects to a single file or a single stream. If the user wants to do more fancy stuff, he has to include the LibGit2Sharp.Advanced namespace and start taking the control by creating a PackBuilder.

I also give up using the PackBuilderOptions type, as I think it will add more complications when it comes to choosing between dir path or a stream or even writing to multiple streams.

I left WritePackTo(Stream) to @kevin-david to add his code and tests, so he takes the credit for it :)

PS. I'm aware that the methods' documentation is outdated, I will fix that once we agree on the code.

Thanks guys I appreciate your feedback, please let me know what you think of this :)

@bording bording closed this Apr 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants