Scilab-RL issues

Scilab-RL issues https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues 2024-04-08T12:28:59+02:00 https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/175 No error if Hyperopt-criterion is missing 2024-04-08T12:28:59+02:00 Leon Sierau

No error if Hyperopt-criterion is missing

Hyperopt should throw an error if the criterion for optimization is not available (for example if it is logged under a different name in WandB). Hyperopt should throw an error if the criterion for optimization is not available (for example if it is logged under a different name in WandB). priority::low status::todo Leon Sierau Leon Sierau https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/174 Hyperparameter Optimization of actor critic 2024-04-03T11:54:12+02:00 Leon Sierau

Hyperparameter Optimization of actor critic

this is a random issue this is a random issue priority::low status::doing Leon Sierau Leon Sierau https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/171 play around with multi-agent RL 2024-02-02T08:50:25+01:00 Manfred Eppe

play around with multi-agent RL

priority::low status::todo Manfred Eppe Manfred Eppe https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/170 Integrate Carlottas environment 2024-01-11T17:29:00+01:00 Manfred Eppe

Integrate Carlottas environment

Integrate Carlottas free energy testing environment in our framework. Integrate Carlottas free energy testing environment in our framework. priority::low status::todo Carlotta Langer Carlotta Langer https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/167 Add performance test for onestep actor critic. 2023-12-30T22:48:20+01:00 Manfred Eppe

Add performance test for onestep actor critic.

Thilo added the one step actor critic algorithm from Sutton and Barto's Book. However, so far it is not double-checked with a performance test. This should be added. M. Thilo added the one step actor critic algorithm from Sutton and Barto's Book. However, so far it is not double-checked with a performance test. This should be added. M. priority::low status::todo https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/159 Add CliffWalking environment performance test 2023-12-11T14:40:11+01:00 Thilo Fryen

Add CliffWalking environment performance test

The `CliffWalking` env is a simple RL environment which can be used to showcase the differences of basic RL algorithms. For example, [this notebook](https://marcinbogdanski.github.io/rl-sketchpad/RL_An_Introduction_2018/1305a_One_Step_Ac... The `CliffWalking` env is a simple RL environment which can be used to showcase the differences of basic RL algorithms. For example, [this notebook](https://marcinbogdanski.github.io/rl-sketchpad/RL_An_Introduction_2018/1305a_One_Step_Actor_Critic.html) implements the `CliffWalking` environment and shows that SARSA and one-step actor-critic solve it differently. For teaching purposes, it would be nice to also have this in our framework. Luckily, it is already [available](https://www.gymlibrary.dev/environments/toy_text/cliff_walking/) in Gymnasium. See if our custom algorithm `onestepac` can solve it and create a performance test for it. Maybe also upload a video of the agent solving the environment to this issue. priority::low status::todo Ismail Barkin Ulusoy Ismail Barkin Ulusoy https://collaborating.tuhh.de/ckv0173/Scilab-RL/-/issues/94 Hyperopt-Score and epoch-limiting 2022-05-16T16:00:09+02:00 Ghost User

Hyperopt-Score and epoch-limiting

I noticed an unfavorable interplay between the hyperopt score and the epoch-limiting during the hyperopt. - Refresh: The hyperopt-score is the sum of the mean of the evaluation metric (usually `test/success_rate`) and the mean of the ev... I noticed an unfavorable interplay between the hyperopt score and the epoch-limiting during the hyperopt. - Refresh: The hyperopt-score is the sum of the mean of the evaluation metric (usually `test/success_rate`) and the mean of the evaluation metric in the last `early_stop_last_n` epochs, divided by the number of epochs. In pseudo-python-code ``` score = mean(early_stop_data_column) + mean(early_stop_data_column[-early_stop_last_n:]) score /= epochs ``` Therefore, the hyperopt score is highly dependent on the number of epochs. - The custom sweeper limits `n_epochs` to `int(n_epochs_of_fastest_run * 1.5)`. The following example shows that this can be problematic: Say we have an easy to learn environment and the following history of `test/success_rate` for one run: ``` suc_hist = [0.5, 0.5, 0.8, 0.7, 0.9, 0.8, 0.6, 1.0, 0.8, 0.9] ``` It had 10 epochs. It's hyperopt-score is 0.165. The hyperopt continues and one lucky configuration manages to reach early stopping in only 5 epochs. Now `n_epochs` is `int(n_epochs_of_fastest_run * 1.5) = 7`. The same run from before would now stop after 7 epochs and achieve a hyperopt_score of about 0.207. This gives a huge advantage for all configurations that run after a run with few epochs. Maybe we should disable the epoch-limiting or at least give an option to disable it. @ckv0173 @czs4581 what do you think? priority::low status::paused