Skip to content

Bring back some pointers #1247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 57 commits into from
Closed

Bring back some pointers #1247

wants to merge 57 commits into from

Conversation

carlosmn
Copy link
Member

This serves as a PoC for how we could be using pointers to access the libgit2 data instead of marshaling structures into managed land and using SafeHandles which are meant for system handles, not the usual CRT-allocated structures.

This starts with a few small things, but we can keep bringing this in bit by bit.

/cc @nulltoken @whoisj

chescock and others added 5 commits November 20, 2015 13:58
As the first step to moving away from marshalling data into managed
memory when we can avoid it, we start with a simple one which we only
read from.
This avoids a copy of the struct when we only want to grab the strings
to convert them to managed strings.
We wrap the owned handle in a IDisposable to keep the using() pattern.
The public methods use the disposable wrapper so we don't have to mark
them as unsafe.
@whoisj
Copy link

whoisj commented Dec 11, 2015

Senior Carlos, you're on fire this morning.

This looks great.

I have sometime similar cooking as a pure proof of concept (not likely to go anywhere) to help me test how thin a layer can be created between a project like LG2 and .NET managed software.

@carlosmn carlosmn force-pushed the pointers branch 3 times, most recently from 41072f0 to bb3d218 Compare December 13, 2015 23:24
@nulltoken
Copy link
Member

@carlosmn Neat job! However, I'm not seeing a net reduction in the number of lines of code. Although this is quite easy to read for something at ease with C, I'm not sure this would be that easy for a .NET contributor. What's the real benefit of this? Do we get a neat perf bump? Something else? In other words, what would make the conversion to unsafe worth the porting effort?

@carlosmn carlosmn force-pushed the pointers branch 2 times, most recently from a6fc4b1 to 95c5176 Compare December 14, 2015 02:21
@carlosmn
Copy link
Member Author

There won't be a big reduction in lines of code even after we remove the redundant Git... classes, but that's not the primary motivator. The code which performs all the copies is hidden behind MarshalAs<T>() and the special handling of SafeHandle by the runtime; if you count those calls as the number of copies it performs, we would count that as a significant reduction.

The goals are twofold:

  1. Get rid of the SafeHandle usage. These are meant for OS resources and are handled especially by the runtime. We should not be using them for libgit2 objects, which are all malloc'ed memory (with the possible exception of writable streams, but even then the resource is held by libgit2 not by us).
  2. Get rid of copying/marshaling to managed memory just to copy again. We use SafeHandle or IntPtr and MarshalAs<T>() to bring in "unsafe" memory into "safe" managed memory. But then we're just copying the data again from the Git... objects into the public structs. Making all that copying useless, using up CPU IO time and causing managed allocations we don't need. We should get a perf bump from avoiding these copies, though with everything else going on it might be hard to measure definitively.

Getting rid of the SafeHandles means we have to either move to pointers in the type system or do everything with IntPtr. While the latter might look more familiar for a "normal" C# developer, the end result is the same. We are dealing with pointers, and doing the wrong operation is going to segfault regardless of whether the method has to be marked unsafe or not.

Regardless of whether we spell something git_index_entry* or IntPtr, the actual knowledge of how unmanaged memory works which you need to have to do anything at the interop layer is the same. And with the actual pointer type we let the type system help us figure out what can be dereferenced, rather than having the user perform managed-looking casts.

If you look at the code for retrieving the output of git_remote_ls(), I'd argue that the "unsafe" code is easier to read as we use an array the way the C code expected you to, and can do

for (int i = 0; i < intCount; i++)
{
    git_remote_head* currentHead = heads[i];
}

instead of

for (int i = 0; i < intCount; i++)
{
    currentHead = IntPtr.Add(currentHead, IntPtr.Size);
}

What we're doing in the "safe" code is manually stepping through the array by manually adding memory addresses instead of the accessing it like a list.

Maybe not everything will be immediately better with "unsafe" pointers, but let's not kid ourselves that coding the interop layer is actually safer when we have untyped pointers. There's a reason we don't just have opaque types be void* in libgit2 (type checking in the compiler). But that is what we're forgoing if we use IntPtr everywhere.

@@ -75,6 +75,11 @@ internal string Path

#region IEnumerable<TreeEntry> Members

unsafe TreeEntry byIndex(ObjectSafeWrapper obj, uint i, ObjectId parentTreeId, Repository repo, FilePath parentPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Use PascalCase for function names.

@whoisj
Copy link

whoisj commented Dec 15, 2015

I'm not sure this would be that easy for a .NET contributor

I'm not sure that should be our first concern. The majority of contributions should be in the layer above the proxy. Ideally the proxy would be a very thin shim between the managed world of C# and native world of Libgit2, and (nearly) nothing more.

If that is the case, C# developers should have no problems adding new features to the pure C# layer at the top of the library's stack.

While these both work roughly in the same manner (as they both represent
iterators) the actual code duplication between these two is rather small
as each `_next()` method is different and how we return the managed values
is different. The net reduction in lines of code indicates that this is
indeed the case.
We cannot create libgit2 wrapper objects which are generic over the
pointer types, so let's use a template instead. We map the C name to the
C# name and generate the same code for each of them.
We add an implicit conversion from the handle to the pointer as there
are a lot of places which rely on the equivalent functionality for the
SafeHandle.
carlosmn and others added 18 commits March 7, 2016 10:43
These are actions, so they need the handle which we previously made the
code keep around.
If the user never asks for the refspecs, we should not spend the time and memory
to load them into managed memory.
We now isolate all tests by setting the config search paths globally at
the start of the fixture.
We set a dummy user name and email in the global configuration through
the options. Move away from this obsoleted method and set the
configuration at the repo level, which is where a test-specific
configuration should live.
These tests use the option to set a global configuration file with the
filemode we want. But the filemode setting should be set per-repository
as it comes down to the workdir for each repository.

Switch to setting the configuration in the local configuration, which is
also more in line with how it would exist in the wild.
In order to check for equality we just need to compare the ID of the
object. We do not need to look up the object but can compare their
identifiers directly.

This also requires us to modify Branch's check for the current branch,
but the new code more accurately reflects the check we do want to
perform. Namely whether HEAD points to our reference name.
Don't load an object just to check its ID
These (except for the one) are there to keep the pointers alive, so we
do want them there even if we never read from them.
…-paths

Obsolete the config paths in RepositoryOptions
It went away when removing the base safe handle, but we need it for the
debug/CI builds.
ethomson pushed a commit that referenced this pull request Mar 21, 2016
Bring back some pointers
@ethomson
Copy link
Member

Manually merged...!

@ethomson ethomson closed this Mar 21, 2016
This lets us have the platform-specific path separators.
@carlosmn carlosmn mentioned this pull request Mar 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants