99 Clojure Problems (50)

Fortunately for those of you following along at home, Wikipedia has a great article on Huffman Coding. The theory is simple, just keep taking the bottom two smallest frequency numbers and smoosh them into a new node with their frequency summed together until you’re left with only two elements. Then split each node of the tree into 0 and 1, and just keep appending on down the tree.

; P50 (***) Huffman code.
(comment "First of all, consult a good book on discrete mathematics or algorithms for a detailed description of Huffman codes!")

(comment "We suppose a set of symbols with their frequencies, given as a list of fr(S,F) terms. Example: [fr(a,45),fr(b,13),fr(c,12),fr(d,16),fr(e,9),fr(f,5)]. Our objective is to construct a list hc(S,C) terms, where C is the Huffman code word for the symbol S. In our example, the result could be Hs = [hc(a,'0'), hc(b,'101'), hc(c,'100'), hc(d,'111'), hc(e,'1101'), hc(f,'1100')] [hc(a,'01'),...etc.]. The task shall be performed by the predicate huffman/2 defined as follows:")  

(defn huffman-tree [fs] 
  (let [fs-sorted (sort-by #(nth % 1) fs)]
    (loop [[a b & tail] fs-sorted] 
        (if (not (empty? tail)) 
          (recur (sort-by #(nth % 1) (conj tail (list (concat (list (nth a 0)) (list (nth b 0))) (+ (nth a 1) (nth b 1))))))
          (concat (list (nth a 0)) (list (nth b 0)))))))
(defn digit-map [a b c]
  (if (seq? a)
    (map digit-map a (list "0" "1") (list (str c b) (str c b)))
    (list a (str c b))))

(defn dft [ls]
  (mapcat (fn [e]
            (if (list? e)
              e
              (dft e)
              )) ls))
(defn process [ls] 
  (loop [[a b & tail] ls
         out '()] 
    (if (empty? tail) 
      (sort-by (fn [[a b]] a) (conj out (list a b))) 
      (recur tail (conj out (list a b))))))

(defn huffman [fs] 
  (let [tree (huffman-tree fs)]
    (process (dft (map 
      digit-map 
      tree 
      (list "0" "1")
      (list "" ""))))))

There’s probably a lot of room to optimize and shortcut a lot of what I did. This was a three star problem, though, so I’m sure that most of the optimization is going to be in the flattening code.