Skip to content

Commit db5f246

Browse files
committed
Added Huffman with Haskell
1 parent 3f108a8 commit db5f246

File tree

2 files changed

+68
-10
lines changed

2 files changed

+68
-10
lines changed
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
import qualified Data.Map as M
2+
import Data.List (insert, sort)
3+
4+
data Tree a = Nil
5+
| Leaf Int a
6+
| Node Int (Tree a) (Tree a)
7+
deriving (Show, Eq)
8+
9+
freq :: Tree a -> Int
10+
freq Nil = 0
11+
freq (Leaf i _) = i
12+
freq (Node i _ _) = i
13+
14+
instance (Eq a) => Ord (Tree a) where
15+
compare t1 t2 = compare (freq t1) (freq t2)
16+
17+
getFrequencies :: Ord a => [a] -> [(Int, a)]
18+
getFrequencies = toSortedList . M.fromListWith (+) . flip zip (repeat 1)
19+
where toSortedList = sort . map swap . M.toList
20+
swap (a, i) = (i, a)
21+
22+
buildTree :: (Ord a) => [a] -> Tree a
23+
buildTree = build . map (uncurry Leaf) . getFrequencies
24+
where build [] = Nil
25+
build [t] = t
26+
build (t1:t2:ts) = build $ insert (Node (freq t1 + freq t2) t1 t2) ts
27+
28+
data Bit = Zero | One
29+
30+
instance Show Bit where
31+
show Zero = "0"
32+
show One = "1"
33+
34+
encode :: (Ord a) => [a] -> (Tree a, [Bit])
35+
encode s = (tree, msg)
36+
where
37+
tree = buildTree s
38+
msg = concatMap (table M.!) s
39+
table = M.fromList $ mkTable (tree, [])
40+
mkTable (Nil, _) = []
41+
mkTable (Leaf _ a, p) = [(a, reverse p)]
42+
mkTable (Node _ t1 t2, p) = concatMap mkTable [(t1, Zero:p), (t2, One:p)]
43+
44+
decode :: (Ord a) => Tree a -> [Bit] -> [a]
45+
decode t = path t
46+
where path (Leaf _ a) m = a : path t m
47+
path (Node _ t1 _) (Zero: m) = path t1 m
48+
path (Node _ _ t2) (One: m) = path t2 m
49+
path _ _ = []
50+
51+
main = do
52+
let msg = "bibbity bobbity"
53+
(tree, encoded) = encode msg
54+
decoded = decode tree encoded
55+
putStrLn $ "Endoding \"" ++ msg ++ "\": " ++ concatMap show encoded
56+
putStrLn $ "Length: " ++ (show $ length encoded)
57+
putStrLn $ "Decoding: " ++ decoded

chapters/data_compression/huffman/huffman.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<script>
22
MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
33
</script>
4-
$$
4+
$$
55
\newcommand{\d}{\mathrm{d}}
66
\newcommand{\bff}{\boldsymbol{f}}
77
\newcommand{\bfg}{\boldsymbol{g}}
@@ -23,7 +23,7 @@ $$
2323

2424
# Huffman Encoding
2525

26-
If there were ever a data compression method to take the world by storm, it would be Huffman encoding.
26+
If there were ever a data compression method to take the world by storm, it would be Huffman encoding.
2727
In fact, this was the method that got me into computational methods to begin with.
2828
I distinctly remember sitting in my data compression class and talking about the great information theorist Claude Shannon and Robert Fano, when suddenly my professor introduced a new kid to the mix: David Huffman.
2929
He managed to rip the heart out of the methods described by leaders of the field and create a data compression method that was easier to understand and implement, while also providing more robust results, and apparently this was all done for a school project!
@@ -34,10 +34,10 @@ I have since accepted that fact and moved on.
3434
Huffman encoding follows from the problem described in the [Data Compression](../data_compression.md) section.
3535
We have a string that we want to encode into bits.
3636
Huffman encoding ensures that our encoded bitstring is as small as possible without losing any information.
37-
Because it is both lossless and gaurantees the smallest possible bitlength, it outright replaces both Shannon and Shannon-Fano encoding in most cases, which is a little weird because the method was devised while Huffman was taking a course from Fano, himself!
37+
Because it is both lossless and guarantees the smallest possible bit length, it outright replaces both Shannon and Shannon-Fano encoding in most cases, which is a little weird because the method was devised while Huffman was taking a course from Fano, himself!
3838

39-
The idea is somewhat straightforward in principle, but a little difficult to code in practice.
40-
By creating a binary tree of the input alphabet, every branch can be provided a unique bit representation simply by assigning a binary value to each child and reading to a character in a leaf node if starting from the root node.
39+
The idea is somewhat straightforward in principle, but a little difficult to code in practice.
40+
By creating a binary tree of the input alphabet, every branch can be provided a unique bit representation simply by assigning a binary value to each child and reading to a character in a leaf node if starting from the root node.
4141

4242
So now the question is: how do we create a binary tree?
4343
Well, here we build it from the bottom up like so:
@@ -48,7 +48,7 @@ Well, here we build it from the bottom up like so:
4848
4. Read the tree backwards from the root node and concatenate the final bitstring codeword. Keep all codewords and put them into your final set of codewords (sometimes called a codebook)
4949
5. Encode your phrase with the codebook.
5050

51-
And that's it.
51+
And that's it.
5252
Here's an image of what this might look like for the phrase `bibbity_bobbity`:
5353

5454
<p align="center">
@@ -67,13 +67,13 @@ This will create a codebook that looks like this:
6767
| ___ | 1111 |
6868

6969
and `bibbity bobbity` becomes `01000010010111011110111000100101110`.
70-
As mentioned this uses the minimum number of bits possible for encoding.
70+
As mentioned this uses the minimum number of bits possible for encoding.
7171
The fact that this algorithm is both conceptually simple and provably useful is rather extraordinary to me and is why Huffman encoding will always hold a special place in my heart.
7272

7373
# Example Code
7474
In code, this can be a little tricky. It requires a method to continually sort the nodes as you add more and more nodes to the system.
7575
The most straightforward way to do this in some languages is with a priority queue, but depending on the language, this might be more or less appropriate.
76-
In addition, to read the tree backwards, some sort of [Depth First Search](../../tree_traversal/tree_traversal.md) needs to be implemented.
76+
In addition, to read the tree backwards, some sort of [Depth First Search](../../tree_traversal/tree_traversal.md) needs to be implemented.
7777
Whether you use a stack or straight-up recursion also depends on the language, but the recursive method is a little easier to understand in most cases.
7878

7979
{% method %}
@@ -83,6 +83,7 @@ Whether you use a stack or straight-up recursion also depends on the language, b
8383
{% sample lang="rs" %}
8484
### Rust
8585
[import, lang:"rust"](code/rust/huffman.rs)
86+
{% sample lang="hs" %}
87+
### Haskell
88+
[import, lang:"haskell"](code/haskell/huffman.hs)
8689
{% endmethod %}
87-
88-

0 commit comments

Comments
 (0)