MIMoEnv base class
This module defines the base MIMo environment.
The abstract base class is MIMoEnv. Default parameters for all the sensory modalities
are provided as well.
Contents
MIMoEnv
- class mimoEnv.envs.mimo_env.MIMoEnv(model_path, initial_qpos=None, frame_skip=2, render_mode=None, camera_id=None, camera_name=None, width=DEFAULT_SIZE, height=DEFAULT_SIZE, default_camera_config=None, proprio_params=None, touch_params=None, vision_params=None, vestibular_params=None, actuation_model=SpringDamperModel, goals_in_observation=True, done_active=False)
Bases:
Generic[gymnasium.core.ObsType,gymnasium.core.ActType]This is the abstract base class for all MIMo experiments.
This class meets the interface requirements for basic gym classes and adds some additional features. The observation space is of dictionary type.
Sensory modules are configured by a parameter dictionary. Default configuration dictionaries are included in the same module as this class,
DEFAULT_PROPRIOCEPTION_PARAMS,DEFAULT_TOUCH_PARAMSDEFAULT_VISION_PARAMS,DEFAULT_VESTIBULAR_PARAMS. Passing these to the constructor will enable the relevant sensory module. Not passing a dictionary disables the relevant module. By default, all sensory modalities are disabled and the only sensor outputs are the relative joint positions. Actuation models can also be changed using the actuation_model constructor argument. They do not use a configuration dictionary, instead deriving all required parameters from the XMLs.Implementing subclasses will have to override the following functions: -
is_success(), to determine when an episode reaches a success terminal state. -is_failure(), to determine when an episode reaches a failure terminal state. -is_truncated(), to determine when an episode ends for other reasons, such as a time limit or out ofbounds condition.
compute_reward(), to compute the reward for at each step.reset_model(), which resets the physical simulation. If you wish to randomize some aspect of the scene this function is the place to implement that.sample_goal(), which should determine the desired end state.get_achieved_goal(), which should return the achieved end state.
Depending on the requirements of your experiment any of these functions may be implemented as dummy functions returning fixed values. Additional functions that may be overridden optionally are:
_is_done(), which determines the ‘terminal’ and ‘truncated’ return values after each step._proprio_setup(),_touch_setup(),_vision_setup(),_vestibular_setup(), these functions initialize the associated sensor modality. These should be overridden if you want to replace the default implementation. Default implementations areSimpleProprioception,DiscreteTouch,SimpleVision,SimpleVestibular.get_proprio_obs(),get_touch_obs(),get_vision_obs(),get_vestibular_obs(), these functions collect the observations of the associated sensor modality. These allow you to do post-processing on the output without having to alter the base implementations._step_callback()and_substep_callbock(), which are called after every environment and simulation step respectively.
These functions come with default implementations that should handle most scenarios.
- Parameters
model_path (str) – The path to the scene xml.
initial_qpos (Dict[str, float]|None) – A dictionary of the initial joint positions. Keys are the joint names, with joint positions in radians as values.
Noneby default.frame_skip (int) – The number of physics substeps for each simulation step. The duration of each physics step is set in the scene XML. Default 2.
render_mode (str|None) – The render mode for gymnasium functions. We support “human”, “rgb_array” and “depth_array”. In mode “human”, the environment can be viewed with an interactive viewer. In modes “rgb_array” and “depth_array”, color images and depths images are rendered and returned. Please see the gymnasium documentation for more details.
camera_id (int) – The camera, by ID, which will be used for rendering.
camera_name (str) – The camera, by name, which will be used for rendering.
width (int) – The width of the rendered image.
height (int) – The height of the rendered image.
proprio_params (Dict|None) – The configuration dictionary for the proprioceptive system. If
Nonethe module is disabled. DefaultNone.touch_params (Dict|None) – The configuration dictionary for the touch system. If
Nonethe module is disabled. DefaultNone.vision_params (Dict|None) – The configuration dictionary for the vision system. If
Nonethe module is disabled. DefaultNone.vestibular_params (Dict|None) – The configuration dictionary for the vestibular system. If
Nonethe module is disabled. DefaultNone.actuation_model (Type[ActuationModel]) – Class for the actuation model. Default is
SpringDamperModel. Note that this must be a class, not an instance.goals_in_observation (bool) – If
Truethe desired and achieved goals are included in the observation dictionary. DefaultTrue.done_active (bool) – If
True,_is_done()returnsTrueif the simulation reaches a success or failure state. IfFalse,_is_done()always returnsFalseand the function callingstep()has to figure out when to stop or reset the simulation on its own.
- model
The MuJoCo model object.
- Type
MjModel
- data
The MuJoCo data object.
- Type
MjData
- init_qpos
The initial position vector for the entire scene. Can be used with
set_state()to return the simulation to its initial state.- Type
np.ndarray
- init_qvel
The initial velocity vectors for the whole scene. Can be used with
set_state()to return the simulation to its initial state.- Type
np.ndarray
- frame_skip
The number of simulation substeps for each environment step.
- action_space
The action space. See Gym documentation for more.
- Type
gym.spaces.Space
- observation_space
The observation space. See Gym documentation for more.
- Type
gym.spaces.Space
- actuation_model
Reference to the actuation model instance.
- Type
- proprio_params
The configuration dictionary for the proprioceptive system.
- Type
Dict
- touch_params
The configuration dictionary for the touch system.
- Type
Dict
- vision_params
The configuration dictionary for the vision system.
- Type
Dict
- vestibular_params
The configuration dictionary for the vestibular system.
- Type
Dict
- proprioception
A reference to the proprioception instance.
- Type
- vestibular
A reference to the vestibular instance.
- Type
- facial_expressions
A dictionary linking emotions with their associated facial textures. The keys of this dictionary are valid inputs for
swap_facial_expression().
- goals_in_observation
If
Truethe desired and achieved goals are included in the observation dictionary. DefaultTrue.- Type
- done_active
If
True,_is_done()returnsTrueif the simulation reaches a success or failure state. IfFalse,_is_done()always returns ``False` and the function callingstep()has to figure out when to stop or reset the simulation on its own.- Type
- _initialize_simulation()
Initialize MuJoCo simulation data structures mjModel and mjData.
- property n_actuators
The number of actuators for MIMo.
- Returns
The number of actuators for MIMo.
- Return type
- _get_actuators()
Saves IDs of the actuators associated with MIMo in
mimo_actuators.
- _get_joints()
Saves the IDs of the joints associated with MIMO in
mimo_joints.
- _set_action_space()
Sets the action space attribute.
By default, the actuation space contains only MIMos actuators.
- _set_observation_space()
Sets the observation space attribute.
Calls
_get_obs()and determines the space using the returned observations.
- _get_facial_expressions(emotion_textures)
Associates facial textures in the model with human-readable names for the associated emotions.
- _env_setup()
This function initializes all the sensory components of the model.
Calls the setup functions for all the sensory components.
- _set_initial_position(initial_qpos)
Sets the initial positions for joints in the environment.
The input should be a dictionary with joint names as keys and joint positions (in radians as floats) as values. Thin function then sets each listed joint to the corresponding position. Joints not contained in the dictionary are left unaltered.
- proprio_setup(proprio_params)
Perform the setup and initialization of the proprioceptive system.
This should be overridden if you want to use another implementation!
- Parameters
proprio_params (dict) – The parameter dictionary.
- touch_setup(touch_params)
Perform the setup and initialization of the touch system.
This should be overridden if you want to use another implementation!
- Parameters
touch_params (dict) – The parameter dictionary.
- vision_setup(vision_params)
Perform the setup and initialization of the vision system.
This should be overridden if you want to use another implementation!
- Parameters
vision_params (dict) – The parameter dictionary.
- vestibular_setup(vestibular_params)
Perform the setup and initialization of the vestibular system.
This should be overridden if you want to use another implementation!
- Parameters
vestibular_params (dict) – The parameter dictionary.
- _single_mujoco_step()
- _set_action(action)
Set the action for the next step.
Calls the actuation models function
mimoActuation.actuation.ActuationModel.action(). What exactly happens depends on the specific implementation.- Parameters
action (numpy.ndarray) – A numpy array with control values.
- do_simulation(action, n_frames)
Step simulation forward for n_frames number of steps.
- Parameters
action (np.ndarray) – The control input for the actuators.
n_frames (int) – The number of physics steps to perform.
- step(action)
Run one timestep of the environment’s dynamics.
This function takes a simulation step with the given control inputs, collects the observations, computes the reward and finally determines if we are done with this episode or not.
_get_obs()collects the observations,compute_reward()calculates the reward.`:meth:._is_done is called to determine if we have reached a terminal state and_step_callback()can be used for extra functions each step, such as incrementing a step counter. Both the ‘terminated’ and ‘truncated’ return values are determined by :meth:._is_done`.- Parameters
action (np.ndarray) – An action provided by the agent
- Returns
- this will be an element of the environment’s
observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.
reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (success or failure as defined under the MDP of the task) is
reached. In this case further step() calls could return undefined results.
- truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.
- info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and
logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.
- this will be an element of the environment’s
- Return type
observation (object)
- _step_callback()
A custom callback that is called after stepping the simulation, but before collecting observations.
Useful to enforce additional constraints on the simulation state before observations are collected. Note that the sensory modalities do not update until get_obs is called, so they will not have updated to the current timestep.
- _substep_callback()
A custom callback that is called after each simulation substep.
- _obs_callback()
A custom callback that is called after collecting the observations.
Like _step_callback, but with up-to-date observations.
- _reset_simulation()
Resets MuJoCo and actuation simulation data and samples a new goal.
- get_proprio_obs()
Collects and returns the outputs of the proprioceptive system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array containing the proprioceptive output.
- Return type
numpy.ndarray
- get_touch_obs()
Collects and returns the outputs of the touch system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array containing the touch output.
- Return type
numpy.ndarray
- get_vision_obs()
Collects and returns the outputs of the vision system.
Override this function if you want to make some simple post-processing!
- get_vestibular_obs()
Collects and returns the outputs of the vestibular system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array with the vestibular data.
- Return type
numpy.ndarray
- _get_obs()
Returns the observation.
This function should return all simulation outputs relevant to whatever learning algorithm you wish to use. We always return proprioceptive information in the ‘observation’ entry, and this information always includes relative joint positions. Other sensory modalities get their own entries, if they are enabled. If
goals_in_observationis set toTrue, the achieved and desired goal are also included.- Returns
A dictionary containing simulation outputs with separate entries for each sensor modality.
- Return type
Dict
- swap_facial_expression(emotion)
Changes MIMos facial texture.
Valid emotion names are in
facial_expression, which links readable emotion names to their associated texture ids.- Parameters
emotion (str) – A valid emotion name.
- _is_done(achieved_goal, desired_goal, info)
This function should determine if we reached the end of an episode. Dummy implementation.
By default, this function always returns False. If
done_activeis set to True, instead returns True if eitheris_success()oris_failure()return True. The goal parameters are there to allow this class to be more easily overridden by subclasses, should this be required. They are ignored by default.- Parameters
- Returns
Whether the current episode reached a success or failure state. truncated (bool): Whether the current episode entered some kind of invalid condition or “finished” due to
some other constraint, such as a time limit.
- Return type
terminated (bool)
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
- is_success(achieved_goal, desired_goal)
Indicates if the achieved goal matches the desired goal.
- is_failure(achieved_goal, desired_goal)
Indicates that we reached a failure state.
- is_truncated()
Indicates that we reached an ending condition other than a success or failure state, such as a time limit.
- Returns
If we reached some ending condition other than a terminal state.
- Return type
- reset_model()
This function should reset the simulation state and return observations for the post-reset state.
- Returns
The observations after reset.
- Return type
Dict
- sample_goal()
Should sample a new goal and return it.
- Returns
The desired end state.
- Return type
- get_achieved_goal()
Should return the goal that was achieved during the simulation.
- Returns
The achieved end state.
- Return type
- compute_reward(achieved_goal, desired_goal, info)
Compute the step reward.
This externalizes the reward function and makes it dependent on a desired goal and the one that was achieved. If you wish to include additional rewards that are independent of the goal, you can include the necessary values to derive it in info and compute it accordingly.
- Parameters
- Returns
The reward that corresponds to the provided achieved goal w.r.t. to the desired goal. Note that the following should always hold true:
ob, reward, done, info = env.step()
assert reward == env.compute_reward(ob[‘achieved_goal’], ob[‘desired_goal’], info)
- Return type
Default data fields
- mimoEnv.envs.mimo_env.SCENE_DIRECTORY
Path to the scene directory.
- mimoEnv.envs.mimo_env.EMOTES
Valid facial expressions.
- mimoEnv.envs.mimo_env.DEFAULT_PROPRIOCEPTION_PARAMS
Default parameters for proprioception. Relative joint positions are always included.
- mimoEnv.envs.mimo_env.DEFAULT_TOUCH_PARAMS
Default touch parameters.
- mimoEnv.envs.mimo_env.DEFAULT_TOUCH_PARAMS_V2
Default touch parameters for the v2 version of MIMo with five fingers and two toes.
- mimoEnv.envs.mimo_env.DEFAULT_VISION_PARAMS
Default vision parameters.
- mimoEnv.envs.mimo_env.DEFAULT_VESTIBULAR_PARAMS
Default vestibular parameters.