Skip to content

huffman implementation in C++ #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 20, 2018
Merged

huffman implementation in C++ #75

merged 1 commit into from
Apr 20, 2018

Conversation

strega-nil
Copy link

@strega-nil strega-nil commented Apr 15, 2018

It doesn't do any encoding/decoding yet; just thought I'd open a PR so that people could comment on it. There's a lot of work here!

EDIT: I'm now finished, and it works, afaict. If people would like to test it, that'd be great; comments are welcome!

@strega-nil strega-nil changed the title [unfinished] huffman implementation in C++ huffman implementation in C++ Apr 15, 2018
@leios
Copy link
Member

leios commented Apr 17, 2018

I saw a bunch of work on this and it seems to have calmed down now. I'll try to check the code tonight when I get back from work.

Thanks again for all the help!

@strega-nil
Copy link
Author

@leios yeah, totally! I believe it's entirely correct now; I got code review from some other C++ programmers :)

Copy link
Member

@leios leios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is syntactically impressive (as it always is -- you really know your stuff), but it is rather hard to parse for even somewhat experienced users of the language.

Would it be possible for you to walk me through the code? I know it sounds a bit pedantic, but if I cannot figure out what's going on here, I cannot merge it.

Basically, I am having trouble figuring out where the tree is and how you are DFS'ing it. My understanding is that you are rolling your own priority queue?

I figure the more questions I ask, the more C++ I'll learn and (hopefully) be able to merge into the AAA.

Modern C++ is a beast of a language. It's got a bunch of neat tricks, but the trickier the code, the harder it is to understand, which is why most of my C++ is "C with additional features." I am definitely not saying modern c++ syntax is bad, I am just trying to make sure we are not unnecessarily over-complicating things.


namespace huffman {

[[noreturn]] inline void unreachable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the [[noreturn]] syntax here? What does it mean, exactly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this function does not return"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, looked it up. I just don't often see attribute specifiers. My bad.

virtual ~node() = 0;
};

// NOTE(nicole): never null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your name was left in the comment here, I suppose that was not intended?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was; I can remove it if you wish, it's just my personal commenting style so that, on large projects, you can figure out who to ask about notes and todos.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough.

using std::begin;
using std::end;

namespace huffman {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe add a few spaces in subsequent lines so we know it's within the huffman namespace? I've seen both spacing and no spacing, so I am not pushing one way or another here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style for C++ is without spacing for a single namespace; we'd have to change all the other C++. This is one of the issues with single-file solutions :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I just didn't remember the style. Thanks

And then is heard no more. It is a tale
Told by an idiot, full of sound and fury,
Signifying nothing.)";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note here that the main function is quite clean and that the Sound and the Fury is one of my favorite books (I know the quote's from Macbeth)

auto ret = encoder_t();

struct rec {
static void f(node const* cur, std::vector<bool> bits, encoder_t& out) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having some trouble understanding exactly what this entire function does. Would you be able to give me a quick run-down?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it generates the encoder and decoder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think get it now. This is creating elements for the flat_map

auto decode_single = [top](Iter it, Iter const last) {
node const* current_node = top;

for (; it != last; ++it) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a while loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would I use a while loop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's fine as-is. Just nitpicking.

// --- implementation ---
inline codebook::node::~node() {}

inline std::vector<bool> with_new_bit(std::vector<bool>&& bits, bool b) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does && mean in this case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rvalue reference"; I should switch these to just taking by-value.

std::shared_ptr<data const> underlying_;

public:
template <class Iter>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a template here (or on most of the functions)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because std::vector<bool>::[const_]iterator is a lot more work to write out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, alright. That actually cleared it up a bit, thanks!

auto encoded = huffman::encoded_string(to_be_encoded);

std::cout << "Encoded, the string looks like: ";
for (auto b : encoded.string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a boolean, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@leios
Copy link
Member

leios commented Apr 18, 2018

Hey Ubsan, thanks again for the code! Sorry it took me so long to review. I had some trouble understanding where everything was (huffman encoding is complicated, c++ is complicated, so when you put them together, you get super complicated).

As always with me, you might need to be patient. I am not that old, but I code like I'm 70, so I might need a little help understanding certain bits.

@strega-nil
Copy link
Author

@leios I feel like this switching to local functions makes it easier to understand what that function is doing.

@strega-nil
Copy link
Author

strega-nil commented Apr 18, 2018

Hmm, I imagine some of the confusion might be alleviated if we switched from a custom flat_map to a std::map.

EDIT: actually, honestly the code is not that much simpler. I think I'd rather keep the flat_map

@leios
Copy link
Member

leios commented Apr 18, 2018

Thanks! going through it again now!

@leios
Copy link
Member

leios commented Apr 19, 2018

Ok, so the comments actually help a good bit (for me). I am keen to merge soon.

@strega-nil
Copy link
Author

@leios tell me when, and I'll rebase and squash :)

@leios
Copy link
Member

leios commented Apr 20, 2018

I'll probably try to merge this when I get home tonight (planning on going through the PR's again). If you want, I can wait until tomorrow morning.

@strega-nil
Copy link
Author

@leios rebased and squashed!

@leios
Copy link
Member

leios commented Apr 20, 2018

Great! I'll merge it in a bit.

@leios leios merged commit 23edc86 into algorithm-archivists:master Apr 20, 2018
@leios
Copy link
Member

leios commented Apr 20, 2018

Alright, this one is done for now. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants