Learning RL is Challenging. Here’s a Guide for Smooth Sailing

This curated path lowers barrier for breaking into practical Reinforcement Learning from scratch

Writing abt Fries at The Pier
4 min readSep 25, 2024

This isn’t just another summary of RL mathematics, algorithm taxonomy, or package tutorial — there are already thousands of those.

Instead, I offer newcomers a strategic path into reinforcement learning. This approach aims to develop a 360-degree understanding: grasping RL’s core characteristics to identify suitable problems, familiarizing yourself with key elements to view challenges through an RL lens, and equipping yourself a toolbox of modern Python libraries for direct problem-solving.

In the following short prelude sections, I’ll explain why learning RL is especially challenging (compared to other STEM topics) and how I believe we can make it easier. Feel free to skip to the main path if you’re eager to get started.

Prelude 1: Why diving straight into RL textbooks or packages can be misleading

  1. RL is a young field without a widely adopted way of teaching it. There isn’t even a standard textbook yet!
  2. RL algorithms are designed for real-world problems, requiring rich context to understand.
  3. RL algorithms aren’t blackbox-ready, necessitating deep understanding for application. Like in practical settings, you are guaranteed to dig into details and maths/assumptions of models.

As a result, two typical fail modes of learning RL:

  • Accidentally starting with a “math-rigorous” book: You encounter probability measures, fixed-point theorem and tons of fancy symbols that seem boring and irrelevant, making RL feel intimidating.
  • Jumping straight to replicating a “paper with code” or applying packaged models: Algorithms don’t work as expected at all, leading you to conclude that RL is useless.

Let’s address these challenges.

Prelude 2: Two learning hacks to make learning easier

  1. Follow great figures who lead academia and industry: I covered this already in last short post <Denoised RL Starter Pack: a Curated Shortlist of Reinforcement Learning Resources>. The path below will exclusively use those resources (freely available online, no paywall!).
  2. Read ten books together: Great teaching shine in different ways. Lectures/textbooks/blogs/interactive notebooks all relate to you differently. You should progress through them concurrently, comparing how same concepts are introduced in different contexts. The more angles you try to “attack” a new topic, the lower the barrier is for you to grasp it. Besides, the richer views you have, the more motivated you are to grind.

Armed with the principles from the two preludes above, you’re already well-equipped to design a learning strategy for yourself. Nevertheless, allow me to showcase mine, which incorporates these insights and my experience.

This is you my dear reader

The Strategic Path towards RL

Level 1: Solid Conceptual Understanding

  1. OpenAI SpinningUp — Why We Built This section: Understanding (again) the motivations and challenges in learning RL. As well as practical advice. Content length: 10 min.
  2. OpenAI Spinning Up — Key Concepts in RL section: Grasp foundational concepts with a blend of motivation, theory, and light mathematics (feel free to skip formulas difficult for now). Content length: 30 min.
  3. Sutton&Barto RL Book — Chapter 1 for a textbook style (in a good way) introduction to the field, with a concrete example of Tic-Tac-Toe. Content length: 1 hour.
  4. Silver’s UCL Course — Lecture 1 video for a second hammer on RL-101s with richer examples and a big picture, from another seminal figure of the industry. Content length: 1.5 hours.
  5. Relax a bit and read Karpathy’s RL blog, feel free to skip math part and focus on insights he shared, like the “On using PG in practice” part. Content length: 15 min.
  6. HuggingFace Deep-RL-Course Unit 1: most material in unit 1 should look easy to you at this time. Take 10min to skim through them one more time and jump to the 20-minute “Hands On” interactive lab homework. It’s super easy but guides you to apply SOTA algorithms on concrete, visually perceptible problems.
  7. Bonus for Python-savvy audience: find the classic inventory control problem (from szepesva book page9 “Example 1”) and write a custom gym env for it. Then play around this by using different actions or models. This isn’t hard as long as you know a bit OOP and able to read docs. The reward is great as you see yourself turn a decades-old abstract problem into codes you could interact and RL could learn. Content length: 30 min.

All above resources are already covered in my last post <Denoised RL Starter Pack: a Curated Shortlist of Reinforcement Learning Resources>. There’re direct links and explanations on why they qualify as best resources.

Steps 1–6 together take about 4 hours to read, but probably 6–10 hours to get confident, depending on your background. Assumption is you have basic level STEM skills.

Level 1 Checklist

  1. Identify suitable problems: sequential decision making without a label
  2. RL problem key characteristics: trial-and-error and delayed reward
    - Compare with supervised and unsupervised learning
    - Exploration vs Exploitation
  3. Elements: environment, state, observation; agent, action, policy; reward, value/return; trajectory
    - rewards result in value, but we evaluate value to decide actions to take
    - Reward Hypothesis: all goals could be described by maximization of expected cumulative reward
    - Efficient value estimation takes the central role in all RL algorithm
    - Fully observable -> MDP
    - Partial observable Markov Decision Process
    - policy is end product. Value is natural requirement: which is of higher value if I need to select one?

Level 2: Orchestrate with formalism

To be continued.

Let me know if you enjoyed level 1 path and I’m happy to continue update!

--

--