Verify and fix hyperopt mechanism

The hyperparameter optimization seems not to work properly. I have already fixed the instructions in the README.md in devel, and the easiest thing to debug the hyperopt is to use the command provided there, i.e., python experiment/train.py +performance=FetchReach/sac_her-opti.yaml --multirun. For me, this started well but after a while it threw the following error:
|    learning_rate   | 0.0217   |
|    n_updates       | 11949    |
---------------------------------
Training finished!
Finishing main training function.
MLflow run: <ActiveRun: >.
Hyperopt score: 0.03333333333333334, epochs: 6.
/home/eppe/Scilab-RL/hydra_plugins/hydra_custom_optuna_sweeper/_impl.py:285: FutureWarning: _tell has been deprecated in v2.5.0. This feature will be removed in v4.0.0. See https://github.com/optuna/optuna/releases/tag/v2.5.0.
  study._tell(trial, state, values)
Error executing job with overrides: ['+algorithm.learning_rate=0.02169932803339823', '++algorithm.replay_buffer_kwargs.n_sampled_goal=8', 'algorithm=sac', '+performance=FetchReach/sac_her-opti.yaml', 'n_epochs=6']
Traceback (most recent call last):
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/hydra/_internal/utils.py", line 386, in <lambda>
    lambda: hydra.multirun(
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 140, in multirun
    ret = sweeper.sweep(arguments=task_overrides)
  File "/home/eppe/Scilab-RL/hydra_plugins/hydra_custom_optuna_sweeper/custom_optuna_sweeper.py", line 45, in sweep
    return self.sweeper.sweep(arguments)
  File "/home/eppe/Scilab-RL/hydra_plugins/hydra_custom_optuna_sweeper/_impl.py", line 271, in sweep
    ret.return_value) == 3, "The return value of main() should be a triple where the first element " \
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
Exception: Traceback (most recent call last):
  File "/home/eppe/Scilab-RL/hydra_plugins/hydra_custom_joblib_launcher/_core.py", line 84, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/eppe/Scilab-RL/experiment/train.py", line 164, in main
    launch(cfg, logger, kwargs)
  File "/home/eppe/Scilab-RL/experiment/train.py", line 121, in launch
    train(baseline, train_env, eval_env, cfg, logger)
  File "/home/eppe/Scilab-RL/experiment/train.py", line 59, in train
    baseline.learn(total_timesteps=total_steps, callback=callback, log_interval=None)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/sac/sac.py", line 289, in learn
    return super(SAC, self).learn(
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 352, in learn
    rollout = self.collect_rollouts(
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 563, in collect_rollouts
    action, buffer_action = self._sample_action(learning_starts, action_noise)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 407, in _sample_action
    unscaled_action, _ = self.predict(self._last_obs, deterministic=False)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/common/base_class.py", line 539, in predict
    return self.policy.predict(observation, state, mask, deterministic)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/common/policies.py", line 302, in predict
    actions = self._predict(observation, deterministic=deterministic)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/sac/policies.py", line 362, in _predict
    return self.actor(observation, deterministic)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/sac/policies.py", line 185, in forward
    return self.action_dist.actions_from_params(mean_actions, log_std, deterministic=deterministic, **kwargs)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/common/distributions.py", line 178, in actions_from_params
    self.proba_distribution(mean_actions, log_std)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/common/distributions.py", line 210, in proba_distribution
    super(SquashedDiagGaussianDistribution, self).proba_distribution(mean_actions, log_std)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/stable_baselines3/common/distributions.py", line 152, in proba_distribution
    self.distribution = Normal(mean_actions, action_std)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/torch/distributions/normal.py", line 50, in __init__
    super(Normal, self).__init__(batch_shape, validate_args=validate_args)
  File "/home/eppe/Scilab-RL/venv/lib/python3.9/site-packages/torch/distributions/distribution.py", line 55, in __init__
    raise ValueError(
ValueError: Expected parameter loc (Tensor of shape (1, 4)) of distribution Normal(loc: torch.Size([1, 4]), scale: torch.Size([1, 4])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan, nan]], device='cuda:0')
For debugging, it is best to disable the multiprocessing by commenting the override hydra/launcher: custom_joblib in main.yaml.
Edited Apr 08, 2022 by Manfred Eppe