`learning_starts`limited to only one episode
We are limiting the sb3 option learning_starts
to be at most one episode long.
However, it is common to permit significantly more environment steps before training.
I think we should get rid of the following lines:
if 'learning_starts' in alg_kwargs:
alg_kwargs['learning_starts'] = max(alg_kwargs['learning_starts'], max_ep_steps)
else:
alg_kwargs['learning_starts'] = max_ep_steps
found here https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/blob/devel/util/util.py#L74