Topics Covered after Midterm:

Reinforcement Learning - learning with a critic

Terminology:

temporal credit assignment
exploration vs exploitation (n-armed bandit)
greedy approach, non greedy approach, balanced approach
agent, environment, actions
agent: policy (p), reward (r), value (V), model
ret: return discounted return, discount factor (g)
R^a_ss': expected return
P^a_ss': state transition probabilities
Q: action-value function
Bellman's Equation and Bellman Optimality Equation for V
Solving Bellman's Equation:
- Solving analytically (e.g. with matrices)
- Policy Iteration
- If we don't know p, P^a_ss', R^a_ss :
  - Monte Carlo
  - Iterative Monte Carlo
  - TD Learning