• Alibek Jakupov

Information Theory: Step 4 Huffman decoder

Updated: May 17




In previous article we implemented Huffman algorithm.


Here is a brief summary of the previous article

Given (Input)

  • A set of symbols and their weights (usually proportional to probabilities).

  • Alphabet A={a1,a2,…,an}, which is the symbol alphabet of size n.

  • Set W={w1,w2,…,wn}, which is the set of the (positive) symbol weights (proportional or equal to probabilities), i.e. wi = weight (ai), 1 <= i <= n.

Find (Output)

  • A binary code (a set of codewords) with minimum expected codeword length.

  • Code C(A,W)={c1,c2,…,cn}, which is the set of (binary) codewords, where ci is the codeword for ai, 1 <= i <= n.



A prefix code is a uniquely decodable code: a receiver can identify each word without requiring a special marker between words.

Thus it makes sense to add a decoder for our algorithm. The logic is quite simple : we construct a dictionary and then use this dictionary to decode our message.


from heapq import heappush, heappop, heapify
from collections import defaultdict

class Dictionary:
 def __init__(self, symbol, huffman_code) :
 self.symbol = symbol
 self.huffman_code = huffman_code
 def getSymbol(self):
 return self.symbol
 def getHuffman(self):
 return self.huffman_code

dictionary = []

def encode(symb2freq):
 """Huffman encode the given dict mapping symbols to weights"""
    heap = [[wt, [sym, ""]] for sym, wt in symb2freq.items()]
    heapify(heap)
 while len(heap) > 1:
        lo = heappop(heap)
        hi = heappop(heap)
 for pair in lo[1:]:
            pair[1] = '0' + pair[1]
 for pair in hi[1:]:
            pair[1] = '1' + pair[1]
        heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
 return sorted(heappop(heap)[1:], key=lambda p: (len(p[-1]), p))

def decode(encoded_sequence):
    temp = []
    seq = ""
    output = ""
 for i in dictionary:
        temp.append(i.getHuffman())
 for digit in encoded_sequence:
        seq += digit
 if temp.count(seq) == 0:
 continue
 else:
            output += dictionary[temp.index(seq)].getSymbol();
            seq=""
    print output

txt = raw_input("Please, enter your sequence")
symb2freq = defaultdict(int)
for ch in txt:
    symb2freq[ch] += 1
in Python 3.1+:
# symb2freq = collections.Counter(txt)
huff = encode(symb2freq)
encoded_sequence = "";
print "Symbol\tWeight\tHuffman Code"
for p in huff:
    item = Dictionary(p[0], p[1])
    dictionary.append(item)
    print "%s\t%s\t%s" % (p[0], symb2freq[p[0]], p[1])
encoded_sequence = ""
for letter in txt:
    temp = []
 for i in dictionary:
        temp.append(i.getSymbol())
    ind = temp.index(letter)
    encoded_sequence += dictionary[ind].getHuffman()
print "the encoded sequence is"
print encoded_sequence
print "the decode sequence is"
decode (encoded_sequence)



And as usual, git hub link

6 views
 
  • Twitter
  • LinkedIn

Since 2018 by ©alirookie