Add CliffWalking environment performance test

The CliffWalking env is a simple RL environment which can be used to showcase the differences of basic RL algorithms. For example, this notebook implements the CliffWalking environment and shows that SARSA and one-step actor-critic solve it differently.

For teaching purposes, it would be nice to also have this in our framework. Luckily, it is already available in Gymnasium.

See if our custom algorithm onestepac can solve it and create a performance test for it. Maybe also upload a video of the agent solving the environment to this issue.