RLBench distinguish between train_env and eval_env

Pyrep can only run one instance per process. Therefore we currently have train_env == eval_env for RLBench environments. This has already caused problems and it would be nice if we could distinguish between training and evaluation environment.

Therefore we'd like to

change the RLBench wrapper so that it is self-aware of its role (train/test) - maybe it is even better to assign a number to each new "instance" of an environment
remove if episode_timesteps >= self.max_episode_length: break from hac.py, as it is just a quickfix for this problem.

Edited Jul 15, 2021 by Manfred Eppe