Create algorithms for teaching
We'd like to have some basic algorithms for teaching.
-
an empty algorithm (random actions) -
an actor-critic close to Sutton Barto (Chapter 11 policy approximation, actor-critic methods).
Try to be close to Sutton Barto, because we use it for teaching.