Fortunately for those of you following along at home, Wikipedia has a great article on Huffman Coding. The theory is simple, just keep taking the bottom two smallest frequency numbers and smoosh them into a new node with their frequency summed together until you’re left with only two elements. Then split each node of the tree into 0 and 1, and just keep appending on down the tree.
; P50 (***) Huffman code.
(comment "First of all, consult a good book on discrete mathematics or algorithms for a detailed description of Huffman codes!")
(comment "We suppose a set of symbols with their frequencies, given as a list of fr(S,F) terms. Example: [fr(a,45),fr(b,13),fr(c,12),fr(d,16),fr(e,9),fr(f,5)]. Our objective is to construct a list hc(S,C) terms, where C is the Huffman code word for the symbol S. In our example, the result could be Hs = [hc(a,'0'), hc(b,'101'), hc(c,'100'), hc(d,'111'), hc(e,'1101'), hc(f,'1100')] [hc(a,'01'),...etc.]. The task shall be performed by the predicate huffman/2 defined as follows:")
(defn huffman-tree [fs]
(let [fs-sorted (sort-by #(nth % 1) fs)]
(loop [[a b & tail] fs-sorted]
(if (not (empty? tail))
(recur (sort-by #(nth % 1) (conj tail (list (concat (list (nth a 0)) (list (nth b 0))) (+ (nth a 1) (nth b 1))))))
(concat (list (nth a 0)) (list (nth b 0)))))))
(defn digit-map [a b c]
(if (seq? a)
(map digit-map a (list "0" "1") (list (str c b) (str c b)))
(list a (str c b))))
(defn dft [ls]
(mapcat (fn [e]
(if (list? e)
e
(dft e)
)) ls))
(defn process [ls]
(loop [[a b & tail] ls
out '()]
(if (empty? tail)
(sort-by (fn [[a b]] a) (conj out (list a b)))
(recur tail (conj out (list a b))))))
(defn huffman [fs]
(let [tree (huffman-tree fs)]
(process (dft (map
digit-map
tree
(list "0" "1")
(list "" ""))))))
There’s probably a lot of room to optimize and shortcut a lot of what I did. This was a three star problem, though, so I’m sure that most of the optimization is going to be in the flattening code.
