Bring back some pointers #1247

carlosmn · 2015-12-11T08:44:30Z

This serves as a PoC for how we could be using pointers to access the libgit2 data instead of marshaling structures into managed land and using SafeHandles which are meant for system handles, not the usual CRT-allocated structures.

This starts with a few small things, but we can keep bringing this in bit by bit.

/cc @nulltoken @whoisj

As the first step to moving away from marshalling data into managed memory when we can avoid it, we start with a simple one which we only read from.

This avoids a copy of the struct when we only want to grab the strings to convert them to managed strings.

We wrap the owned handle in a IDisposable to keep the using() pattern.

The public methods use the disposable wrapper so we don't have to mark them as unsafe.

whoisj · 2015-12-11T16:29:47Z

Senior Carlos, you're on fire this morning.

This looks great.

I have sometime similar cooking as a pure proof of concept (not likely to go anywhere) to help me test how thin a layer can be created between a project like LG2 and .NET managed software.

nulltoken · 2015-12-13T23:59:22Z

@carlosmn Neat job! However, I'm not seeing a net reduction in the number of lines of code. Although this is quite easy to read for something at ease with C, I'm not sure this would be that easy for a .NET contributor. What's the real benefit of this? Do we get a neat perf bump? Something else? In other words, what would make the conversion to unsafe worth the porting effort?

carlosmn · 2015-12-14T15:38:49Z

There won't be a big reduction in lines of code even after we remove the redundant Git... classes, but that's not the primary motivator. The code which performs all the copies is hidden behind MarshalAs<T>() and the special handling of SafeHandle by the runtime; if you count those calls as the number of copies it performs, we would count that as a significant reduction.

The goals are twofold:

Get rid of the SafeHandle usage. These are meant for OS resources and are handled especially by the runtime. We should not be using them for libgit2 objects, which are all malloc'ed memory (with the possible exception of writable streams, but even then the resource is held by libgit2 not by us).
Get rid of copying/marshaling to managed memory just to copy again. We use SafeHandle or IntPtr and MarshalAs<T>() to bring in "unsafe" memory into "safe" managed memory. But then we're just copying the data again from the Git... objects into the public structs. Making all that copying useless, using up CPU IO time and causing managed allocations we don't need. We should get a perf bump from avoiding these copies, though with everything else going on it might be hard to measure definitively.

Getting rid of the SafeHandles means we have to either move to pointers in the type system or do everything with IntPtr. While the latter might look more familiar for a "normal" C# developer, the end result is the same. We are dealing with pointers, and doing the wrong operation is going to segfault regardless of whether the method has to be marked unsafe or not.

Regardless of whether we spell something git_index_entry* or IntPtr, the actual knowledge of how unmanaged memory works which you need to have to do anything at the interop layer is the same. And with the actual pointer type we let the type system help us figure out what can be dereferenced, rather than having the user perform managed-looking casts.

If you look at the code for retrieving the output of git_remote_ls(), I'd argue that the "unsafe" code is easier to read as we use an array the way the C code expected you to, and can do

for (int i = 0; i < intCount; i++)
{
    git_remote_head* currentHead = heads[i];
}

instead of

for (int i = 0; i < intCount; i++)
{
    currentHead = IntPtr.Add(currentHead, IntPtr.Size);
}

What we're doing in the "safe" code is manually stepping through the array by manually adding memory addresses instead of the accessing it like a list.

Maybe not everything will be immediately better with "unsafe" pointers, but let's not kid ourselves that coding the interop layer is actually safer when we have untyped pointers. There's a reason we don't just have opaque types be void* in libgit2 (type checking in the compiler). But that is what we're forgoing if we use IntPtr everywhere.

Therzok · 2015-12-14T16:14:17Z

LibGit2Sharp/Tree.cs

@@ -75,6 +75,11 @@ internal string Path

        #region IEnumerable<TreeEntry> Members

+        unsafe TreeEntry byIndex(ObjectSafeWrapper obj, uint i, ObjectId parentTreeId, Repository repo, FilePath parentPath)


Style: Use PascalCase for function names.

whoisj · 2015-12-15T14:37:05Z

I'm not sure this would be that easy for a .NET contributor

I'm not sure that should be our first concern. The majority of contributions should be in the layer above the proxy. Ideally the proxy would be a very thin shim between the managed world of C# and native world of Libgit2, and (nearly) nothing more.

If that is the case, C# developers should have no problems adding new features to the pure C# layer at the top of the library's stack.

While these both work roughly in the same manner (as they both represent iterators) the actual code duplication between these two is rather small as each `_next()` method is different and how we return the managed values is different. The net reduction in lines of code indicates that this is indeed the case.

We cannot create libgit2 wrapper objects which are generic over the pointer types, so let's use a template instead. We map the C name to the C# name and generate the same code for each of them.

We add an implicit conversion from the handle to the pointer as there are a lot of places which rely on the equivalent functionality for the SafeHandle.

These are actions, so they need the handle which we previously made the code keep around.

If the user never asks for the refspecs, we should not spend the time and memory to load them into managed memory.

We now isolate all tests by setting the config search paths globally at the start of the fixture.

We set a dummy user name and email in the global configuration through the options. Move away from this obsoleted method and set the configuration at the repo level, which is where a test-specific configuration should live.

These tests use the option to set a global configuration file with the filemode we want. But the filemode setting should be set per-repository as it comes down to the workdir for each repository. Switch to setting the configuration in the local configuration, which is also more in line with how it would exist in the wild.

In order to check for equality we just need to compare the ID of the object. We do not need to look up the object but can compare their identifiers directly. This also requires us to modify Branch's check for the current branch, but the new code more accurately reflects the check we do want to perform. Namely whether HEAD points to our reference name.

Don't load an object just to check its ID

These (except for the one) are there to keep the pointers alive, so we do want them there even if we never read from them.

…-paths Obsolete the config paths in RepositoryOptions

It went away when removing the base safe handle, but we need it for the debug/CI builds.

Bring back some pointers

ethomson · 2016-03-21T14:59:01Z

Manually merged...!

This lets us have the platform-specific path separators.

chescock and others added 5 commits November 20, 2015 13:58

Handle exceptions and null returns from CredentialsProvider.

4f7e628

Use a pointer to retrieve the library's error

b1f1f47

As the first step to moving away from marshalling data into managed memory when we can avoid it, we start with a simple one which we only read from.

Use pointers for retrieving a config entry

c2b73df

This avoids a copy of the struct when we only want to grab the strings to convert them to managed strings.

Use pointers for tree entries

e5aa4fb

We wrap the owned handle in a IDisposable to keep the using() pattern.

Use pointers for git references

7445d4c

The public methods use the disposable wrapper so we don't have to mark them as unsafe.

carlosmn force-pushed the pointers branch from c1e2a41 to 7445d4c Compare December 11, 2015 13:47

carlosmn force-pushed the pointers branch 3 times, most recently from 41072f0 to bb3d218 Compare December 13, 2015 23:24

carlosmn force-pushed the pointers branch 2 times, most recently from a6fc4b1 to 95c5176 Compare December 14, 2015 02:21

Therzok reviewed Dec 14, 2015
View reviewed changes

carlosmn added 4 commits December 16, 2015 13:35

Use pointers for getting the list of advertised refs

ed25e69

Use pointers to read the index entries

92fc8f5

Get rid of the index entry SafeHandle

966ec5a

carlosmn force-pushed the pointers branch from ea8c815 to 43521a6 Compare December 16, 2015 12:35

carlosmn added 2 commits January 12, 2016 05:52

Get rid of the branch iterator SafeHandle

690af70

Create handles via templating

c05ee27

We cannot create libgit2 wrapper objects which are generic over the pointer types, so let's use a template instead. We map the C name to the C# name and generate the same code for each of them.

carlosmn force-pushed the pointers branch from fb24584 to c05ee27 Compare January 13, 2016 04:39

carlosmn added 6 commits January 13, 2016 10:57

Move git_reference to the template

5762938

Get rid of OidSafeHandle

c0cf249

Get rid of GitRefSpecHandle

70074f9

Remove unused TreeEntrySafeHandle

8be1e38

Get rid of RepositorySafeHandle

86498c7

We add an implicit conversion from the handle to the pointer as there are a lot of places which rely on the equivalent functionality for the SafeHandle.

Set the pointer to null upon disposing

b2d9e44

carlosmn and others added 18 commits March 7, 2016 10:43

Bind the missing refspec methods

53b66be

These are actions, so they need the handle which we previously made the code keep around.

Properly implement disposable for Remote

d6282d3

Don't load refspecs eagerly

24c4a8e

If the user never asks for the refspecs, we should not spend the time and memory to load them into managed memory.

Merge branch 'cmn/refspec-transform'

78a0f4b

Merge commit 'refs/pull/1239/head' of github.com:libgit2/libgit2sharp

4f68818

Add the template for changes after v0.22

4938414

Mark per-repo config locations obsolete

3c3674a

Remove the explicit repository isolation in tests

1bd1cd5

We now isolate all tests by setting the config search paths globally at the start of the fixture.

Set the dummy user in the local configuration

f67f385

We set a dummy user name and email in the global configuration through the options. Move away from this obsoleted method and set the configuration at the repo level, which is where a test-specific configuration should live.

Add CHANGES entry for config paths in RepositoryOptions

4924146

Merge pull request libgit2#1275 from libgit2/cmn/equality-over-id

e715299

Don't load an object just to check its ID

Make tests aware of the ProgramData config level

06d2866

Teach the custom config builder about ProgramData

ebc43b4

Suppress or fix some unused varible warnings

3615df9

These (except for the one) are there to keep the pointers alive, so we do want them there even if we never read from them.

Merge pull request libgit2#1274 from libgit2/cmn/obsolete-repo-config…

45846e2

…-paths Obsolete the config paths in RepositoryOptions

Bring back LeaksContainer

d45523c

It went away when removing the base safe handle, but we need it for the debug/CI builds.

carlosmn force-pushed the pointers branch from 841dad8 to d45523c Compare March 21, 2016 07:45

carlosmn added 2 commits March 21, 2016 09:37

Merge remote-tracking branch 'upstream/master' into pointers

db76f6e

Dispose of a few Remotes in the tests

9225134

carlosmn force-pushed the pointers branch from 8e61d0f to 9225134 Compare March 21, 2016 09:29

carlosmn added 2 commits March 21, 2016 14:42

Register Remotes for cleanup with the repo

395616b

fixup! Merge remote-tracking branch 'upstream/master' into pointers

04fe088

carlosmn force-pushed the pointers branch from aa37bc7 to 04fe088 Compare March 21, 2016 14:00

ethomson pushed a commit that referenced this pull request Mar 21, 2016

Merge pull request #1247

5e683b9

Bring back some pointers

ethomson closed this Mar 21, 2016

Do convert unsafe paths into native

93e1d46

This lets us have the platform-specific path separators.

carlosmn mentioned this pull request Mar 22, 2016

Drop Mono workaround #1108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bring back some pointers #1247

Bring back some pointers #1247

Uh oh!

carlosmn commented Dec 11, 2015

Uh oh!

whoisj commented Dec 11, 2015

Uh oh!

nulltoken commented Dec 13, 2015

Uh oh!

carlosmn commented Dec 14, 2015

Uh oh!

Therzok Dec 14, 2015

Uh oh!

whoisj commented Dec 15, 2015

Uh oh!

ethomson commented Mar 21, 2016

Uh oh!

Uh oh!

		@@ -75,6 +75,11 @@ internal string Path

		#region IEnumerable<TreeEntry> Members

		unsafe TreeEntry byIndex(ObjectSafeWrapper obj, uint i, ObjectId parentTreeId, Repository repo, FilePath parentPath)

Bring back some pointers #1247

Bring back some pointers #1247

Uh oh!

Conversation

carlosmn commented Dec 11, 2015

Uh oh!

whoisj commented Dec 11, 2015

Uh oh!

nulltoken commented Dec 13, 2015

Uh oh!

carlosmn commented Dec 14, 2015

Uh oh!

Therzok Dec 14, 2015

Choose a reason for hiding this comment

Uh oh!

whoisj commented Dec 15, 2015

Uh oh!

ethomson commented Mar 21, 2016

Uh oh!

Uh oh!