Skip to content

Datahandle test improvements and Endianness fixes #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 13, 2018

Conversation

gab1one
Copy link
Contributor

@gab1one gab1one commented Sep 28, 2018

The coverage of DataHandle by DataHandleTest was not sufficient, a lot of methods where untested. This PR greatly extends the test coverage and fixes some discovered bugs.

Copy link
Member

@ctrueden ctrueden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I think it's great. There is just the performance concern I mentioned. Can you make a byte[8] upfront and just reuse it in all the places you need a small byte array?

}
if ((ch0 | ch1) < 0) throw new EOFException();
return (short) ((ch0 << 8) + (ch1 << 0));
final byte[] buf = new byte[2];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will destroy performance. We cannot allocate a new byte array with every short read. You could use an instance field buffer of sufficient size to avoid the issue.

Copy link
Contributor Author

@gab1one gab1one Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we cache this in the interface though? Should we require all DataHandle implementations to provide a buffer array which is used for conversions? Should be easy to put into the AbstractDataHandle

}
if ((ch0 | ch1 | ch2 | ch3) < 0) throw new EOFException();
return ((ch0 << 24) + (ch1 << 16) + (ch2 << 8) + (ch3 << 0));
final byte[] buf = new byte[4];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same: performance will tank. Need a small reusable byte[] as instance field. This should be OK, since these methods are already not thread-safe.

ch0 = read();
}
if ((ch0 | ch1 | ch2 | ch3 | ch4 | ch5 | ch6 | ch7) < 0) {
byte[] buf = new byte[8];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use a reusable buffer.

@ctrueden ctrueden added the to do label Oct 3, 2018
@gab1one
Copy link
Contributor Author

gab1one commented Oct 4, 2018

@ctrueden, I addressed your performance concern in 7b7e03b
I am not sure I like this solution though, it might be better to just push the affected methods into AbstractDataHandle and not expose the conversionBuffer in the interface.

@ctrueden ctrueden added in progress and removed to do labels Oct 9, 2018
This is both more consistent and correct
- StringBuilder is faster and recommended for most cases
RandomAccessFile is always BigEndian, so we need to use the interface
methods for conversion.
The was not sufficient before which resulted in various problems
not being discovered.
- this avoids allocating a new byte array for each read
@gab1one gab1one force-pushed the datahandle-endianness branch from 7b7e03b to e9f820a Compare October 12, 2018 15:45
@ctrueden
Copy link
Member

ctrueden commented Oct 13, 2018

@gab1one I considered different solutions to address the issue of conversionBuffer() as public API in the interface.

One idea I had was to use a static ThreadLocal<byte[]> in a utility class, to give back a unique byte[] per thread. I benchmarked it, and the performance is equivalent to your direct field storing the byte[] per instance of DataHandle.

Then I thought: where will this utility class live? What will it be called? It is pretty specialized, always returning a byte[8]. It could be generalized, but maybe YAGNI applies here. I figured we want this to be an internal implementation detail, so the utility class itself should maybe be package-protected.

But then I wondered: is it ever useful to be able to override the conversion buffer access? I am torn. There might be cases where it is helpful downstream; I don't know. (But this also might be a case of YAGNI.)

I am definitely not eager to move so many default method implementations in the abstract layer, since it makes implementing the interface directly much more complicated, even though as you point out it would certainly have better encapsulation regarding this issue.

In the absence of a clearly superior alternative, I'm going to merge this as-is with the conversionBuffer() method exposed in the public API.

By the way, in my investigations, I was curious whether interfaces can have inner classes, and indeed they can! But they are always public static. (I was thinking this is a way to embed static fields in an interface: nest them inside an inner class as static fields of that class. But then the interface has a weird inner class exposed to the world, which is still not very nice.)

@ctrueden ctrueden merged commit 2ea19d2 into master Oct 13, 2018
@ctrueden ctrueden deleted the datahandle-endianness branch October 13, 2018 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants