Fortunately for those of you following along at home, Wikipedia has a great article on Huffman Coding. The theory is simple, just keep taking the bottom two smallest frequency numbers and smoosh them into a new node with their frequency summed together until you’re left with only two elements. Then split each node of the tree into 0 and 1, and just keep appending on down the tree.
; P50 (***) Huffman code. (comment "First of all, consult a good book on discrete mathematics or algorithms for a detailed description of Huffman codes!") (comment "We suppose a set of symbols with their frequencies, given as a list of fr(S,F) terms. Example: [fr(a,45),fr(b,13),fr(c,12),fr(d,16),fr(e,9),fr(f,5)]. Our objective is to construct a list hc(S,C) terms, where C is the Huffman code word for the symbol S. In our example, the result could be Hs = [hc(a,'0'), hc(b,'101'), hc(c,'100'), hc(d,'111'), hc(e,'1101'), hc(f,'1100')] [hc(a,'01'),...etc.]. The task shall be performed by the predicate huffman/2 defined as follows:") (defn huffman-tree [fs] (let [fs-sorted (sort-by #(nth % 1) fs)] (loop [[a b & tail] fs-sorted] (if (not (empty? tail)) (recur (sort-by #(nth % 1) (conj tail (list (concat (list (nth a 0)) (list (nth b 0))) (+ (nth a 1) (nth b 1)))))) (concat (list (nth a 0)) (list (nth b 0))))))) (defn digit-map [a b c] (if (seq? a) (map digit-map a (list "0" "1") (list (str c b) (str c b))) (list a (str c b)))) (defn dft [ls] (mapcat (fn [e] (if (list? e) e (dft e) )) ls)) (defn process [ls] (loop [[a b & tail] ls out '()] (if (empty? tail) (sort-by (fn [[a b]] a) (conj out (list a b))) (recur tail (conj out (list a b)))))) (defn huffman [fs] (let [tree (huffman-tree fs)] (process (dft (map digit-map tree (list "0" "1") (list "" ""))))))
There’s probably a lot of room to optimize and shortcut a lot of what I did. This was a three star problem, though, so I’m sure that most of the optimization is going to be in the flattening code.