MLG 023 Deep NLP 2

Aug 20, 2017

Click to Play Episode

Network architectures used in natural language processing (NLP): recurrent neural networks (RNNs), bidirectional RNNs, and solutions to the vanishing and exploding gradient problems using Long Short-Term Memory (LSTM) cells. The distinctions between supervised and reinforcement learning for sequence tasks, the use of encoder-decoder models, and the significance of transforming words into numerical vectors for these processes.

Resources

Resources best viewed here

Stanford CS224N: NLP with Deep Learning

Speech and Language Processing (3rd Ed. Draft) by Jurafsky & Martin

Stanford CS336 Language Modeling from Scratch

Hands-On Large Language Models: Language Understanding and Generation 1st Edition

Hugging Face NLP / LLM Course

Coursera Generative AI with Large Language Models

Andrej Karpathy - Neural Networks: Zero to Hero

Show Notes

Learn Faster with a Walking DeskWalk While You Learn

Sitting for hours drains energy and focus. A walking desk boosts alertness, helping you retain complex ML topics more effectively.Boost focus and energy to learn faster and retain more.Discover the benefitsDiscover the benefits

See resources on Deep Learning episode.

Neural Network Types in NLP

Vanilla Neural Networks (Feedforward Networks):
- Used for general classification or regression tasks.
- Examples include predicting housing costs or classifying images as cat, dog, or tree.
Convolutional Neural Networks (CNNs):
- Primarily used for image-related tasks.
Recurrent Neural Networks (RNNs):
- Used for sequence-based tasks such as weather predictions, stock market predictions, and natural language processing.
- Differ from feedforward networks as they loop back onto previous steps to handle sequences over time.

Key Concepts and Applications

Supervised vs Reinforcement Learning:
- Supervised learning involves training models using labeled data to learn patterns and create labels autonomously.
- Reinforcement learning focuses on learning actions to maximize a reward function over time, suitable for tasks like gaming AI but less so for tasks like NLP.
Encoder-Decoder Models:
- These models process entire input sequences before producing output, crucial for tasks like machine translation, where full context is needed before output generation.
- Transforms sequences to a vector space (encoding) and reconstructs it to another sequence (decoding).
Gradient Problems & Solutions:
- Vanishing and Exploding Gradient Problems occur during training due to backpropagation over time steps, causing information loss or overflow, notably in longer sequences.
- Long Short-Term Memory (LSTM) Cells solve these by allowing RNNs to retain important information over longer time sequences, effectively mitigating gradient issues.

LSTM Functionality

An LSTM cell replaces traditional neurons in an RNN with complex machinery that regulates information flow.
Components within an LSTM cell:
- Forget Gate: Decides which information to discard from the cell state.
- Input Gate: Determines which information to update.
- Output Gate: Controls the output from the cell.