Unstable Hyperparams handling
Sometimes, SB3 algorithms become unstable when the wrong hyperparameters are chosen. In this case, we now catch the corresponding ValueError
and return an hyperopt-score of 0. We also return n_epochs
as the number of run repochs instead of the actual run epochs, because of the following case:
The hyperopt starts, the first hyperparameter config is unstable, the algorithm fails in the first epoch and returns a hyperopt_score = 0 and epochs = 0. This is still the best score because it is the first, so now the maximal number of epochs is 0*1,5 = 0, leading all following configurations to stop immediately.
Other changes:
- make sac the default algorithm
- added another mujoco installation error-fix to the readme