Search
• Alibek Jakupov

# Information Theory: Step 4 Huffman decoder

Updated: Nov 19, 2021 In previous article we implemented Huffman algorithm.

Here is a brief summary of the previous article

Given (Input)

• A set of symbols and their weights (usually proportional to probabilities).

• Alphabet A={a1,a2,…,an}, which is the symbol alphabet of size n.

• Set W={w1,w2,…,wn}, which is the set of the (positive) symbol weights (proportional or equal to probabilities), i.e. wi = weight (ai), 1 <= i <= n.

Find (Output)

• A binary code (a set of codewords) with minimum expected codeword length.

• Code C(A,W)={c1,c2,…,cn}, which is the set of (binary) codewords, where ci is the codeword for ai, 1 <= i <= n. A prefix code is a uniquely decodable code: a receiver can identify each word without requiring a special marker between words.

Thus it makes sense to add a decoder for our algorithm. The logic is quite simple : we construct a dictionary and then use this dictionary to decode our message.

```from heapq import heappush, heappop, heapify
from collections import defaultdict

class Dictionary:
def __init__(self, symbol, huffman_code) :
self.symbol = symbol
self.huffman_code = huffman_code
def getSymbol(self):
return self.symbol
def getHuffman(self):
return self.huffman_code

dictionary = []

def encode(symb2freq):
"""Huffman encode the given dict mapping symbols to weights"""
heap = [[wt, [sym, ""]] for sym, wt in symb2freq.items()]
heapify(heap)
while len(heap) > 1:
lo = heappop(heap)
hi = heappop(heap)
for pair in lo[1:]:
pair = '0' + pair
for pair in hi[1:]:
pair = '1' + pair
heappush(heap, [lo + hi] + lo[1:] + hi[1:])
return sorted(heappop(heap)[1:], key=lambda p: (len(p[-1]), p))

def decode(encoded_sequence):
temp = []
seq = ""
output = ""
for i in dictionary:
temp.append(i.getHuffman())
for digit in encoded_sequence:
seq += digit
if temp.count(seq) == 0:
continue
else:
output += dictionary[temp.index(seq)].getSymbol();
seq=""
print output

txt = raw_input("Please, enter your sequence")
symb2freq = defaultdict(int)
for ch in txt:
symb2freq[ch] += 1
# in Python 3.1+:
# symb2freq = collections.Counter(txt)
huff = encode(symb2freq)
encoded_sequence = "";
print "Symbol\tWeight\tHuffman Code"
for p in huff:
item = Dictionary(p, p)
dictionary.append(item)
print "%s\t%s\t%s" % (p, symb2freq[p], p)
encoded_sequence = ""
for letter in txt:
temp = []
for i in dictionary:
temp.append(i.getSymbol())
ind = temp.index(letter)
encoded_sequence += dictionary[ind].getHuffman()
print "the encoded sequence is"
print encoded_sequence
print "the decode sequence is"
decode (encoded_sequence)

```

And as usual, git hub link