top of page
  • Writer's pictureAlibek Jakupov

Machine Learning and Music : Grace and Beauty (part VI)

Updated: Nov 19, 2021

Music, when soft voices die, Vibrates in the memory— Odours, when sweet violets sicken, Live within the sense they quicken.

Percy Shelley, ‘To -’

In the previous article we discussed the timeline of the Artificial Intelligence from 1982 to 1989. In this article we are going to cover the next period of the fascinating history of the AI and music.

1992 : TD-Gammon, Jodeci and Boyz II Men

Backgammon is one of the oldest known board games. Its history can be traced back nearly 5,000 years to archeological discoveries in Mesopotamia. It is a two player game where each player has fifteen pieces (checkers) which move between twenty-four triangles (points) according to the roll of two dice. The objective of the game is to be first to bear off, i.e. move all fifteen checkers off the board. Backgammon is a member of the tables family, one of the oldest classes of board games.

In 1992 Gerald Tesauro at IBM's Thomas J. Watson Research Center developed TD-Gammon , a computer backgammon program. This program used an artificial neural network trained using temporal-difference learning, specifically TD-lambda. TD-Gammon was able to rival the abilities of top human backgammon players, however it was not able to consistently surpass their skills.

A single recorded by American R&B group Boyz II Men for the Boomerang soundtrack called End of the Road was released in 1992 and written and produced by Kenneth "Babyface" Edmonds, L.A. Reid and Daryl Simmons. This single became the most listened pop song of the year.

The most popular RnB album of the year was Forever My Lady, the debut studio album by American R&B quartet Jodeci

1995 : Random Forest, SVM, Gangsta's Paradise and Mary J. Blige

Everyone working with Machine Learning Algorithm is familiar with Random Forest Algorithm.

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.Random decision forests correct for decision trees' habit of overfitting to their training set. The first algorithm for random decision forests was created by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg. An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark (as of 2019, owned by Minitab, Inc.). The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance.

But did you know that the algorithm was firstly mentioned in the Tin Kam Ho research paper published in 1995? Extensions of this algorithm are used nowadays as non-parametric algorithms and allow creating powerful and efficient machine learning solutions.

Another great breakthrough was the discovery of Support Vector Machines when Corinna Cortes and Vladimir Vapnik published their work.

In machine learning, support-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall. In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.

Undoubtedly these algorithms had a great impact on the industrial Machine Learning as we know it today.

My Life, the second studio album by American RnB recording artist Mary J. Blige, released in 1994, by Uptown Records and became the most popular RnB album of 1995.

The most listened pop song was a single by American rapper Coolio, featuring singer L.V. called Gangsta's Paradise.

The song was listed at number 85 on Billboard's Greatest Songs of All-Time and number one biggest selling single of 1995 on U.S. Billboard. In 2008, it was ranked number 38 on VH1's 100 Greatest Songs of Hip Hop. Coolio was awarded a Grammy for Best Rap Solo Performance, two MTV Video Music Award's for Best Rap Video and for Best Video from a Film and a Billboard Music Award for the song/album. The song was voted as the best single of the year in The Village Voice Pazz & Jop critics poll. The song has sold over 5 million copies in the United States, United Kingdom and Germany alone, and at least 6 million worldwide, making it one of the best-selling singles of all time. Coolio has performed this song live at the 1995 Billboard Music Awards with L.V. and Wonder, at the 38th Annual Grammy Awards with L.V., and also with Dutch singer Trijntje Oosterhuis.

1997: Kasparov, LSTM, Spice Girls and Elton John

In 1997 there was a great achievement of Machine Learning as IBM's Deep Blue beated the world champion at chess, Garry Kasparov, whom many consider to be the greatest chess player of all time.

Deep Blue was a chess-playing computer developed by IBM. It is known for being the first computer chess-playing system to win both a chess game and a chess match against a reigning world champion under regular time controls. Deep Blue won its first game against a world champion on 10 February 1996, when it defeated Garry Kasparov in game one of a six-game match. However, Kasparov won three and drew two of the following five games, defeating Deep Blue by a score of 4–2. Deep Blue was then heavily upgraded, and played Kasparov again in May 1997. Deep Blue won game six, therefore winning the six-game rematch 3½–2½ and becoming the first computer system to defeat a reigning world champion in a match under standard chess tournament time controls. Kasparov accused IBM of cheating and demanded a rematch. IBM refused and dismantled Deep Blue. Development for Deep Blue began in 1985 with the ChipTest project at Carnegie Mellon University. This project eventually evolved into Deep Thought, at which point the development team was hired by IBM. The project evolved once more with the new name Deep Blue in 1989. Grandmaster Joel Benjamin was also part of the development team.

Another great achievement was the invention of long short-term memory recurrent neural networks (LSTM) by Sepp Hochreiter and Jürgen Schmidhuber. This algorithm greatly improved the efficiency and practical value of recurrent neural networks.

Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections that make it a "general purpose computer" (that is, it can compute anything that a Turing machine can). It can not only process single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. Bloomberg Business Week wrote: "These powers make LSTM arguably the most commercial AI achievement, used for everything from predicting diseases to composing music." A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.

The most listened pop song of 1997 was "Something About the Way You Look Tonight", a song by Elton John, released in 1997 as the first single from his 26th studio album The Big Picture.

'Billboard said the song is "a grandly executed ballad that washes John's larger-than-life performance in cinematic strings and whooping, choir-styled backing vocals. An instant fave for die-hards, this single will bring kids at top 40 to the table after a few spins."

The most popular album of the year was Spice by Spice Girls.

1998 : MNIST, Too close and Titanic

If you have ever worked with Computer Vision you are definitely familiar with the MNIST database. A team led by Yann LeCun released the MNIST database in 1998. MNIST was a dataset comprising a mix of handwritten digits from American Census Bureau employees and American high school students. This database is commonly known as a benchmark for evaluating handwriting recognition.

The most popular of the 1998 was "Too Close" a single by American R&B group Next, featuring uncredited vocals from Vee of Koffee Brown. This single became the most listened in both and RnB categories.

Not surprisingly, the most popular album of 1998 was the Titanic soundtrack as the movie was a true breakthrough.

We have finished one of the most fascinating chapters of the AI evolution called the 90s.

In the next article we will start the next one. Hope you enjoyed this.


  1. Tesauro, Gerald (March 1995). "Temporal Difference Learning and TD-Gammon". Communications of the ACM. 38 (3). doi:10.1145/203330.203343.

  2. Ho, Tin Kam (August 1995). "Random Decision Forests" (PDF). Proceedings of the Third International Conference on Document Analysis and Recognition. Montreal, Quebec: IEEE. 1: 278–282. doi:10.1109/ICDAR.1995.598994. ISBN 0-8186-7128-9. Retrieved 5 June 2016.

  3. Golge, Eren. "BRIEF HISTORY OF MACHINE LEARNING". A Blog From a Human-engineer-being. Retrieved 5 June 2016.

  4. Cortes, Corinna; Vapnik, Vladimir (September 1995). "Support-vector networks". Machine Learning. Kluwer Academic Publishers. 20 (3): 273–297. doi:10.1007/BF00994018. ISSN 0885-6125.

  5. Hochreiter, Sepp; Schmidhuber, Jürgen (1997). "LONG SHORT-TERM MEMORY" (PDF). Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. Archived from the original (PDF) on 2015-05-26.

  6. LeCun, Yann; Cortes, Corinna; Burges, Christopher. "THE MNIST DATABASE of handwritten digits". Retrieved 16 June 2016.

27 views0 comments

Recent Posts

See All


bottom of page