-
-
Notifications
You must be signed in to change notification settings - Fork 360
Add Python Implementation of Huffman Encoding #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# Huffman Encoding | ||
# Python 2.7+ | ||
# Submitted by Matthew Giallourakis | ||
|
||
from collections import Counter | ||
|
||
# constructs the tree | ||
def build_tree(message): | ||
|
||
# get sorted list of character,frequency pairs | ||
frequencies = Counter(message) | ||
trees = frequencies.most_common() | ||
|
||
# while there is more than one tree | ||
while len(trees) > 1: | ||
|
||
# pop off the two trees of least weight from the trees list | ||
tree_left,weight_left = trees.pop() | ||
tree_right,weight_right = trees.pop() | ||
|
||
# combine the nodes and add back to the nodes list | ||
new_tree = [tree_left,tree_right] | ||
new_weight = weight_left+weight_right | ||
trees.append((new_tree,new_weight)) | ||
|
||
# sort the trees list by weight | ||
trees = sorted(trees, key=lambda n: n[1], reverse=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't have to sort the entire trees list after each iteration. I know that it will always be pretty small, but I think it would be nicer here to find the right place in the list and use # Find the first tree that has a weight smaller than new_weight and returns its index in the list
# If no such tree can be found, use len(trees) instead to append
index = next((i for i, tree in enumerate(trees) if tree[1] < new_weight), len(trees))
# Insert the new tree there
trees.insert(index, (new_tree, new_weight)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking of doing an insert, but I thought it would be a little harder to explain and detract from the point of the code. I'll replace it with your code (thanks!) and do the more efficient option from now on. |
||
|
||
tree = trees[0][0] | ||
return tree | ||
|
||
# constructs the mapping with recursion | ||
def build_mapping(tree,code=''): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You seem to not like spaces between comma-separated identifiers. To stay consistent with other code examples and code outside the AAA you should probably put spaces between function parameters, list items, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one is personal preference, because I usually use white space to indicate order of operations, so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I forgot to mention that this goes for pretty much all operators, too ( I can see how it makes sense in your example and maybe it's okay to omit the space in some cases if it really improves readability. But we generally like spaces here. :D There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I've never programmed in an environment that other people needed to look at my code, so I appreciate the pointers! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very good. Code review is weird because I'm always afraid of sounding like "YOU'RE DOING IT WRONG! YOU SHOULD DO IT LIKE ME AND YOUR CODE IS BAD!" but we seem to be on the same page! |
||
|
||
results = [] | ||
|
||
# split the tree | ||
left_tree,right_tree = tree | ||
|
||
# if the left node has children, find the mapping of those children | ||
# else pair the character with the current code + 0 | ||
if type(left_tree) is list: | ||
results += build_mapping(left_tree,code+'0') | ||
else: | ||
results.append((left_tree,code+'0')) | ||
|
||
# if the right node has children, find the mapping of those children | ||
# else pair the character with the current code + 1 | ||
if type(right_tree) is list: | ||
results += build_mapping(right_tree,code+'1') | ||
else: | ||
results.append((right_tree,code+'1')) | ||
|
||
return results | ||
|
||
# encodes the message | ||
def encode(mapping,message): | ||
|
||
encoding = "" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You use double quotes here and in a few other places as well, while you used single quotes in others. You should stick to one or the other and since single quotes are more common in Python and because other code examples in the AAA already use them, I recommend you change all your double quotes to single quotes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whoops, I didn't even notice that I did that! Fixing that up now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The variable name of this confused me for a second. Maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True, I'll make the variables more descriptive |
||
|
||
# build a char -> code dictionary | ||
forward_dict = dict(mapping) | ||
|
||
# replace each character with its code | ||
for char in message: | ||
encoding += forward_dict[char] | ||
|
||
return encoding | ||
|
||
# decodes a message | ||
def decode(mapping,encoding): | ||
|
||
message = "" | ||
key = "" | ||
|
||
# build a code -> char dictionary | ||
inverse_dict = dict([(v,k) for k,v in mapping]) | ||
|
||
# for each bit in the encoding | ||
# if the bit is in the dictionary, replace the bit with the paired character | ||
# else look at the bit and the following bits together until a match occurs | ||
# move to the next bit not yet looked at | ||
for index,bit in enumerate(encoding): | ||
key += bit | ||
if key in inverse_dict: | ||
message += inverse_dict[key] | ||
key = "" | ||
|
||
return message | ||
|
||
def main(): | ||
|
||
# test example | ||
message = "bibbity_bobbity" | ||
tree = build_tree(message) | ||
mapping = build_mapping(tree) | ||
encoding = encode(mapping,message) | ||
decoding = decode(mapping,encoding) | ||
|
||
print('message: '+message) | ||
print('tree: '+str(tree)) | ||
print('mapping: '+str(mapping)) | ||
print('encoding: '+encoding) | ||
print('decoding: '+decoding) | ||
|
||
# prints the following: | ||
# | ||
# message: bibbity_bobbity | ||
# tree: ['b', [[['_', 'o'], 'y'], ['t', 'i']]] | ||
# mapping: [('b', '0'), ('_', '1000'), ('o', '1001'), | ||
# ('y', '101'), ('t', '110'), ('i', '111')] | ||
# encoding: 01110011111010110000100100111110101 | ||
# decoding: bibbity_bobbity | ||
|
||
if __name__ == '__main__': | ||
main() |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm super nitpicky here because this is a comment, but commas are usually followed by spaces.I mentioned the spaces after commas in another comment already. Oops!