algorithm-archivists · leios · Apr 18, 2018 · Apr 17, 2018 · Apr 17, 2018 · Apr 17, 2018
diff --git a/chapters/data_compression/huffman/code/haskell/huffman.hs b/chapters/data_compression/huffman/code/haskell/huffman.hs
@@ -0,0 +1,57 @@
+import qualified Data.Map as M
+import           Data.List (insert, sort)
+
+data Tree a = Leaf Int a
+            | Node Int (Tree a) (Tree a)
+                   deriving (Show, Eq)
+
+freq :: Tree a -> Int
+freq (Leaf i _)   = i
+freq (Node i _ _) = i
+
+instance (Eq a) => Ord (Tree a) where
+  compare t1 t2 = compare (freq t1) (freq t2)
+
+getFrequencies :: Ord a => [a] -> [(Int, a)]
+getFrequencies = toSortedList . M.fromListWith (+) . flip zip (repeat 1)
+  where toSortedList = sort . map swap . M.toList
+        swap (a, i) = (i, a)
+
+buildTree :: (Ord a) => [a] -> Maybe (Tree a)
+buildTree = build . map (uncurry Leaf) . getFrequencies
+  where build []         = Nothing
+        build [t]        = Just t
+        build (t1:t2:ts) = build $ insert (Node (freq t1 + freq t2) t1 t2) ts
+
+data Bit = Zero | One
+
+instance Show Bit where
+  show Zero = "0"
+  show One = "1"
+
+encode :: (Ord a) => [a] -> (Maybe (Tree a), [Bit])
+encode s = (tree, msg)
+  where
+  tree = buildTree s
+  msg = concatMap (table M.!) s
+  table = case tree of
+    Nothing -> M.empty
+    Just t  -> M.fromList $ mkTable (t, [])
+  mkTable (Leaf _ a, p)     = [(a, reverse p)]
+  mkTable (Node _ t1 t2, p) = concatMap mkTable [(t1,  Zero:p), (t2, One:p)]
+
+decode :: (Ord a) => Maybe (Tree a) -> [Bit] -> [a]
+decode Nothing _ = []
+decode (Just t) m = path t m
+  where path (Leaf _ a) m            = a : path t m
+        path (Node _ t1 _) (Zero: m) = path t1 m
+        path (Node _ _ t2) (One: m)  = path t2 m
+        path _ _                     = []
+
+main = do
+  let msg = "bibbity bobbity"
+      (tree, encoded) = encode msg
+      decoded = decode tree encoded
+  putStrLn $ "Endoding \"" ++ msg ++ "\": " ++ concatMap show encoded
+  putStrLn $ "Length: " ++ (show $ length encoded)
+  putStrLn $ "Decoding: " ++ decoded
diff --git a/chapters/data_compression/huffman/huffman.md b/chapters/data_compression/huffman/huffman.md
@@ -1,7 +1,7 @@
 <script>
 MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
 </script>
-$$ 
+$$
 \newcommand{\d}{\mathrm{d}}
 \newcommand{\bff}{\boldsymbol{f}}
 \newcommand{\bfg}{\boldsymbol{g}}
@@ -23,7 +23,7 @@ $$
 
 # Huffman Encoding
 
-If there were ever a data compression method to take the world by storm, it would be Huffman encoding. 
+If there were ever a data compression method to take the world by storm, it would be Huffman encoding.
 In fact, this was the method that got me into computational methods to begin with.
 I distinctly remember sitting in my data compression class and talking about the great information theorist Claude Shannon and Robert Fano, when suddenly my professor introduced a new kid to the mix: David Huffman.
 He managed to rip the heart out of the methods described by leaders of the field and create a data compression method that was easier to understand and implement, while also providing more robust results, and apparently this was all done for a school project!
@@ -34,10 +34,10 @@ I have since accepted that fact and moved on.
 Huffman encoding follows from the problem described in the [Data Compression](../data_compression.md) section.
 We have a string that we want to encode into bits.
 Huffman encoding ensures that our encoded bitstring is as small as possible without losing any information.
-Because it is both lossless and gaurantees the smallest possible bitlength, it outright replaces both Shannon and Shannon-Fano encoding in most cases, which is a little weird because the method was devised while Huffman was taking a course from Fano, himself!
+Because it is both lossless and guarantees the smallest possible bit length, it outright replaces both Shannon and Shannon-Fano encoding in most cases, which is a little weird because the method was devised while Huffman was taking a course from Fano, himself!
 
-The idea is somewhat straightforward in principle, but a little difficult to code in practice. 
-By creating a binary tree of the input alphabet, every branch can be provided a unique bit representation simply by assigning a binary value to each child and reading to a character in a leaf node if starting from the root node. 
+The idea is somewhat straightforward in principle, but a little difficult to code in practice.
+By creating a binary tree of the input alphabet, every branch can be provided a unique bit representation simply by assigning a binary value to each child and reading to a character in a leaf node if starting from the root node.
 
 So now the question is: how do we create a binary tree?
 Well, here we build it from the bottom up like so:
@@ -48,7 +48,7 @@ Well, here we build it from the bottom up like so:
 4. Read the tree backwards from the root node and concatenate the final bitstring codeword. Keep all codewords and put them into your final set of codewords (sometimes called a codebook)
 5. Encode your phrase with the codebook.
 
-And that's it. 
+And that's it.
 Here's an image of what this might look like for the phrase `bibbity_bobbity`:
 
 <p align="center">
@@ -66,14 +66,14 @@ This will create a codebook that looks like this:
 | _o_       | 1110               |
 | ___       | 1111               |
 
-and `bibbity bobbity` becomes `01000010010111011110111000100101110`.
-As mentioned this uses the minimum number of bits possible for encoding. 
+and `bibbity_bobbity` becomes `01000010010111011110111000100101110`.
+As mentioned this uses the minimum number of bits possible for encoding.
 The fact that this algorithm is both conceptually simple and provably useful is rather extraordinary to me and is why Huffman encoding will always hold a special place in my heart.
 
 # Example Code
 In code, this can be a little tricky. It requires a method to continually sort the nodes as you add more and more nodes to the system.
 The most straightforward way to do this in some languages is with a priority queue, but depending on the language, this might be more or less appropriate.
-In addition, to read the tree backwards, some sort of [Depth First Search](../../tree_traversal/tree_traversal.md) needs to be implemented. 
+In addition, to read the tree backwards, some sort of [Depth First Search](../../tree_traversal/tree_traversal.md) needs to be implemented.
 Whether you use a stack or straight-up recursion also depends on the language, but the recursive method is a little easier to understand in most cases.
 
 {% method %}
@@ -83,6 +83,7 @@ Whether you use a stack or straight-up recursion also depends on the language, b
 {% sample lang="rs" %}
 ### Rust
 [import, lang:"rust"](code/rust/huffman.rs)
+{% sample lang="hs" %}
+### Haskell
+[import, lang:"haskell"](code/haskell/huffman.hs)
 {% endmethod %}
-
-