In previous article we implemented Huffman algorithm.
Here is a brief summary of the previous article
Given (Input)
A set of symbols and their weights (usually proportional to probabilities).
Alphabet A={a1,a2,…,an}, which is the symbol alphabet of size n.
Set W={w1,w2,…,wn}, which is the set of the (positive) symbol weights (proportional or equal to probabilities), i.e. wi = weight (ai), 1 <= i <= n.
Find (Output)
A binary code (a set of codewords) with minimum expected codeword length.
Code C(A,W)={c1,c2,…,cn}, which is the set of (binary) codewords, where ci is the codeword for ai, 1 <= i <= n.
A prefix code is a uniquely decodable code: a receiver can identify each word without requiring a special marker between words.
Thus it makes sense to add a decoder for our algorithm. The logic is quite simple : we construct a dictionary and then use this dictionary to decode our message.
from heapq import heappush, heappop, heapify
from collections import defaultdict
class Dictionary:
def __init__(self, symbol, huffman_code) :
self.symbol = symbol
self.huffman_code = huffman_code
def getSymbol(self):
return self.symbol
def getHuffman(self):
return self.huffman_code
dictionary = []
def encode(symb2freq):
"""Huffman encode the given dict mapping symbols to weights"""
heap = [[wt, [sym, ""]] for sym, wt in symb2freq.items()]
heapify(heap)
while len(heap) > 1:
lo = heappop(heap)
hi = heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heappop(heap)[1:], key=lambda p: (len(p[-1]), p))
def decode(encoded_sequence):
temp = []
seq = ""
output = ""
for i in dictionary:
temp.append(i.getHuffman())
for digit in encoded_sequence:
seq += digit
if temp.count(seq) == 0:
continue
else:
output += dictionary[temp.index(seq)].getSymbol();
seq=""
print output
txt = raw_input("Please, enter your sequence")
symb2freq = defaultdict(int)
for ch in txt:
symb2freq[ch] += 1
# in Python 3.1+:
# symb2freq = collections.Counter(txt)
huff = encode(symb2freq)
encoded_sequence = "";
print "Symbol\tWeight\tHuffman Code"
for p in huff:
item = Dictionary(p[0], p[1])
dictionary.append(item)
print "%s\t%s\t%s" % (p[0], symb2freq[p[0]], p[1])
encoded_sequence = ""
for letter in txt:
temp = []
for i in dictionary:
temp.append(i.getSymbol())
ind = temp.index(letter)
encoded_sequence += dictionary[ind].getHuffman()
print "the encoded sequence is"
print encoded_sequence
print "the decode sequence is"
decode (encoded_sequence)
And as usual, git hub link
Comments