Introduction to Deep Reinforcement Learning (WiSe)

We offer the course Introduction to Deep Reinforcement Learning from Winter Semester 2021/22 onwards. The course covers rational decision making on the intersection of Operations Research and Machine Learning with Deep Reinforcement Learning. 

To participate in this course, you have to enroll via TUMonline.

Mandatory Prerequisites

To successfully attend this course, students should be comfortable with math-centric content, algorithms, and proofs. Students should have a general understanding of:

  • basic linear algebra, including for example matrix multiplication and matrix-vector multiplication
  • multivariate calculus, including for example partial derivatives, the chain rule, and gradients
  • basic stochastics, including for example discrete and continuous random variables and probability distributions, as well as the notions of expectation and variance
  • basics of mathematical optimization, including for example constrained optimization problems and the notion of convergence

For the programming exercises, which are a part of this course and its exam, we use the Python programming language and the NumPy library. Thus, students should ideally be familiar with Python. Alternatively, knowledge of a general purpose programming language (e.g., C++, Java) or Matlab is sufficient as well, as students will be able to adapt to Python very quickly.

Intended Learning Outcomes

After attending this course, students will have acquired:

  • basic knowledge in the domain of search algorithms, e.g., graph and tree search, and understand the fundamental theory behind it
  • the competence/capability to analyze a practical problem by modelling it as a Markov Decision Process (MDP)
  • profound knowledge in the domain of reinforcement learning and understanding of fundamental reinforcement learning theory, e.g., Q-learning, TD learning
  • basic knowledge in deep learning and understanding of fundamental machine learning and deep learning theory, e.g., stochastic gradient descent, logistic regression, artificial neural networks
  • profound knowledge in the domain of deep reinforcement learning (DRL) that combines the previous two competence areas and understanding of fundamental DRL theory, e.g., deep Q-networks (DQN), advanced policy gradient methods such as proximal policy optimization (PPO)
  • the competence/capability to apply a DRL framework to a practical problem
  • the competence/capability to evaluate DRL methods w.r.t. to advantages and disadvantages
  • the competence/capability to evaluate practical applications w.r.t. typical pitfalls (e.g., convergence issues with non-independent samples) when using DRL and how to circumvent them

Course Organization

The module content covers the theory of Deep Reinforcement Learning and required fundamentals. Specifically, topics include but are not limited to:

  • fundamentals (e.g., stochastic gradient descent, logistic regression, artificial neural networks)
  • Deep Q-Networks and Rainbow DQNs
  • policy gradients, trust region policy optimization, proximal policy optimization
  • actor critic methods, soft actor critic methods
  • applied case studies