Topics Covered after Midterm:
Reinforcement Learning - learning with a critic
Terminology:
- temporal credit assignment
- exploration vs exploitation (n-armed bandit)
- greedy approach, non greedy approach, balanced approach
- agent, environment, actions
- agent: policy (p), reward (r), value (V), model
- ret: return discounted return, discount factor (g)
- Rass': expected return
- Pass': state transition probabilities
- Q: action-value function
- Bellman's Equation and Bellman Optimality Equation for V
- Solving Bellman's Equation:
- Solving analytically (e.g. with matrices)
- Policy Iteration
- If we don't know p, Pass', Rass :
- Monte Carlo
- Iterative Monte Carlo
- TD Learning