Saturday, September 7, 2024

From Tennis Courts to Neural Networks: An Introduction to LSTM

Deep Learning concepts seem hard to grasp at first, but I am going to attempt to make that easy. This article will focus on LSTM.It stands for Long Short-Term Memory. Wait? It's both Long and Short-Term? What is it? Even the name is confusing until you understand it.

Definition and Explanation - Academic 

"A recurrent neural network (RNN) designed to address the vanishing gradient problem in traditional RNNs."

When dealing with sequential data, the gradient (change) of data is lost as the data processing technique has no information on what was processed before. There is a special memory unit called Cell State that handles the memory useful for processing.

Processing involves Gates that continuously address the gradient issue.

Gates: Three gates (input, forget, and output) that regulate the flow of information:

  • Input gate: Controls what new information is added to the cell state
  • Forget gate: Decides what information to discard from the cell state
  • Output gate: Determines action based on information from the cell state to output

These gates use sigmoid and tanh activation functions to control information flow and update the cell state at each time step. 

In essence, the problem is addressed on an ongoing basis. Take in what is working and reject what isn't working. This is very familiar, and we apply it in real life, from running a successful business to playing a Tennis match. Concepts are way easier to grasp when they can be illustrated using an analogy, this will help illustrate the key concepts of LSTM in a more relatable way.

Explanation - Using Tennis Game Plan!

Imagine you're a tennis player in a match, and your LSTM network is your strategic thinking process throughout the game.



The Setup

You are the LSTM Network, each point is a time step in the sequence. The game plan is the cell state. The decision-making process represents the gates in LSTM, and finally, the match is the entire input sequence.

Workflow

1. Initialization

   You begin with a general game plan, like an LSTM's initialized cell state.

2. Playing each point 

  • Forget Gate: After each point, you decide what part of your strategy is no longer relevant. Maybe you forget about targeting your opponent's backhand because it's improving.
    • LSTM: This gate decides what information to discard from the cell state.
  • Input Gate: You observe new information during the point - your opponent's serve placement, their movement patterns, etc.
    • LSTM: This gate decides what new information to add to the cell state.
  •  Cell State Update: You update your overall game plan, combining your existing strategy with new observations.
    • LSTM: This is updating the cell state with new information while retaining relevant past information.
  •  Output Gate: You decide which aspects of your updated game plan are most relevant for the next point.
    • LSTM: This gate determines what information to output from the cell state.

3. Next Point (Making predictions)

  • Based on your updated game plan and current state of play, you predict how to approach the next point.
    •  LSTM: This is equivalent to making a prediction based on the processed sequence.  

4. Match Continues 

  • You repeat this process for each point, constantly refining your strategy.
    • LSTM: This represents processing each time step in the sequence.

Just as you might remember a successful strategy from early in the match even after many points (long-term memory), LSTM can retain important information from much earlier in the sequence.

Your ability to forget irrelevant tactics (like a failed approach shot) while remembering important ones (like exploiting your opponent's weak serve) mirrors how LSTM selectively retains or forgets information.

The Final Game (Output)

By the end of the match, your game plan has evolved based on all the points played, with some strategies being more influential than others.Similarly, LSTM's final output is based on processing the entire sequence, with the ability to give more weight to certain parts of the sequence than others.

This tennis game plan captures the essence of how LSTM processes sequential data, makes decisions at each step, and uses its "memory" to make predictions based on both recent and long-past information. It demonstrates how LSTM can adapt to changing patterns in the data, just as a tennis player adapts their strategy throughout a match.

Now if you read the Academic portion again , it will make more sense 😀

How do you Apply this ?

There are multiple frameworks Keras being the most popular. I have created a demo of using it to predict stock market prices. Here is an example.

As a next step, you can dive into the statistical approach and explore the activation functions used for the gates.


No comments:

Post a Comment

AI is evolving... now what

These days, “ AI Development ” often means a developer is wielding an  LLM  to generate insights and hook into existing  APIs , data, and re...