Create algorithms for teaching

We'd like to have some basic algorithms for teaching.

  • an empty algorithm (random actions)
  • an actor-critic close to Sutton Barto (Chapter 11 policy approximation, actor-critic methods).

Try to be close to Sutton Barto, because we use it for teaching.