top of page

SMOTE on Azure ML Studio

How often we have a fully normalized dataset with almost equal number of instances per class? In a real-world cases with data collected...

SMOTE Memory Error: Tips and tricks

If you are reading this article then you have probably already faced this issue while running your oversampling experiment. Here I share...

Information Theory: Step 7 Add noise

In this article we are going to create a simple noise generator to set up our future experiment. As we have seen before, the main goal of...

Install Polyglot on Windows

There is a strong debate on the issue of usefulness/uselessness of Windows for datascience. While the majority of Python packages work...

Information Theory: Step 6 Hamming code

This article is the logical suit of the previous article devoted to error correction. The goal is create an error correction algorithm...

Information Theory: Step 5 Error control

Here's the 5th step of our introduction to the Information Theory. In this article we are going to cover the techniques that enable...

Information Theory: Step 4 Huffman decoder

In previous article we implemented Huffman algorithm. Here is a brief summary of the previous article Given (Input) A set of symbols and...

Information Theory: Step 2 Entropy

This is the second article from the series of short articles devoted to the Information Theory. Conditional Entropy Conditional entropy...

Information Theory: Step 1 Probability

Introduction This article is the first part of a series of articles devoted to the study of the Information Theory, step by step. In this...

Human-like suggestions using NLP

In our previous article we described how to detect intruders (textual anomalies) in topics sent by LDA represented as a multinomial...

LDA automatic labeling: experimentation

This article is the logical conclusion of our previous article on automatic topic labeling. As the discussed task is too subjective and...

LDA automatic labeling

As topic modeling has increasingly attracted interest from researchers there exists plenty of algorithms that produce a distribution over...

Blog: Blog2
bottom of page