Merge forward models into scilab
As an example of how to use forward models in Scilab, we provide a version of cleanPPO that is equipped with a forward model. This needs to be documented and added to the wiki in a way that allows users to equip an algorithm of their choice with a forward model.
At the moment, the following steps are required:
- A copy of the RL-algorithm of choice must be created, including a copy of the algo config (which is in line with the scilab logic so far)
- The files with the forward models and the fw_utils must be copied to the same folder
- The config needs to be expanded by a field "fwd" that contains the fields hidden_size, n_hidden_layers, retrain_every_n_steps and fw_learning_rate
- In the agent initialization, the action_shape and obs_shape must be changed to instance variables, so that they are accessible
- The agent must be extended by a get_action_shape and get_obs_shape getter
- In the algo initialization, the forward model of choice must be instanciated. It expects fwd, action_shape and obs_shape as parameters
- In the algo initizalization, a storage for the training data must be instanciated. For this, the Training_Data class from the fw_utils is used
- In the algo initialization, the optimizer for the forward model must be instanciated
- The user must find the appropriate place to collect training data (we do not want to rely on the replay buffer, because not all algorithms use one). The appropriate place is usually right after env.step() was called.
- Training data is collected by passing last_obs, action and new_obs to the Training datas "collect_training_data" method.
- In the RL-algorithms learning loop, the collected data must be batched. For this, we use the Training_Data classes "get_dataloader" method
- The forward models "train" method can then be called on the batched training data (either in every step or only every retrain_every_n_steps'th step)
- The forward models also come with a "predict", "save_model" and "save_state_dict" method.
- It is recommended to save the state dicts for using the pre-trained model later
To-Do:
- Test, whether this is compatible with the other algos implemented in Scilab
- Document the above and include it in the wiki