Sometimes, SB3 algorithms become unstable when the wrong hyperparameters are chosen. In this case, we now catch the corresponding
ValueError and return an hyperopt-score of 0. We also return
n_epochs as the number of run repochs instead of the actual run epochs, because of the following case:
The hyperopt starts, the first hyperparameter config is unstable, the algorithm fails in the first epoch and returns a hyperopt_score = 0 and epochs = 0. This is still the best score because it is the first, so now the maximal number of epochs is 0*1,5 = 0, leading all following configurations to stop immediately.