MIMoEnv base class

This module defines the base MIMo environment.

The abstract base class is MIMoEnv. Default parameters for all the sensory modalities are provided as well.

MIMoEnv

class mimoEnv.envs.mimo_env.MIMoEnv(model_path, initial_qpos=None, frame_skip=2, render_mode=None, camera_id=None, camera_name=None, width=DEFAULT_SIZE, height=DEFAULT_SIZE, default_camera_config=None, proprio_params=None, touch_params=None, vision_params=None, vestibular_params=None, actuation_model=SpringDamperModel, goals_in_observation=True, done_active=False)

Bases: Generic[gymnasium.core.ObsType, gymnasium.core.ActType]

This is the abstract base class for all MIMo experiments.

This class meets the interface requirements for basic gym classes and adds some additional features. The observation space is of dictionary type.

Sensory modules are configured by a parameter dictionary. Default configuration dictionaries are included in the same module as this class, DEFAULT_PROPRIOCEPTION_PARAMS, DEFAULT_TOUCH_PARAMS DEFAULT_VISION_PARAMS, DEFAULT_VESTIBULAR_PARAMS. Passing these to the constructor will enable the relevant sensory module. Not passing a dictionary disables the relevant module. By default, all sensory modalities are disabled and the only sensor outputs are the relative joint positions. Actuation models can also be changed using the actuation_model constructor argument. They do not use a configuration dictionary, instead deriving all required parameters from the XMLs.

Implementing subclasses will have to override the following functions: - is_success(), to determine when an episode reaches a success terminal state. - is_failure(), to determine when an episode reaches a failure terminal state. - is_truncated(), to determine when an episode ends for other reasons, such as a time limit or out of

bounds condition.

  • compute_reward(), to compute the reward for at each step.

  • reset_model(), which resets the physical simulation. If you wish to randomize some aspect of the scene this function is the place to implement that.

  • sample_goal(), which should determine the desired end state.

  • get_achieved_goal(), which should return the achieved end state.

Depending on the requirements of your experiment any of these functions may be implemented as dummy functions returning fixed values. Additional functions that may be overridden optionally are:

  • _is_done(), which determines the ‘terminal’ and ‘truncated’ return values after each step.

  • _proprio_setup(), _touch_setup(), _vision_setup(), _vestibular_setup(), these functions initialize the associated sensor modality. These should be overridden if you want to replace the default implementation. Default implementations are SimpleProprioception, DiscreteTouch, SimpleVision, SimpleVestibular.

  • get_proprio_obs(), get_touch_obs(), get_vision_obs(), get_vestibular_obs(), these functions collect the observations of the associated sensor modality. These allow you to do post-processing on the output without having to alter the base implementations.

  • _step_callback() and _substep_callbock(), which are called after every environment and simulation step respectively.

These functions come with default implementations that should handle most scenarios.

Parameters
  • model_path (str) – The path to the scene xml.

  • initial_qpos (Dict[str, float]|None) – A dictionary of the initial joint positions. Keys are the joint names, with joint positions in radians as values. None by default.

  • frame_skip (int) – The number of physics substeps for each simulation step. The duration of each physics step is set in the scene XML. Default 2.

  • render_mode (str|None) – The render mode for gymnasium functions. We support “human”, “rgb_array” and “depth_array”. In mode “human”, the environment can be viewed with an interactive viewer. In modes “rgb_array” and “depth_array”, color images and depths images are rendered and returned. Please see the gymnasium documentation for more details.

  • camera_id (int) – The camera, by ID, which will be used for rendering.

  • camera_name (str) – The camera, by name, which will be used for rendering.

  • width (int) – The width of the rendered image.

  • height (int) – The height of the rendered image.

  • proprio_params (Dict|None) – The configuration dictionary for the proprioceptive system. If None the module is disabled. Default None.

  • touch_params (Dict|None) – The configuration dictionary for the touch system. If None the module is disabled. Default None.

  • vision_params (Dict|None) – The configuration dictionary for the vision system. If None the module is disabled. Default None.

  • vestibular_params (Dict|None) – The configuration dictionary for the vestibular system. If None the module is disabled. Default None.

  • actuation_model (Type[ActuationModel]) – Class for the actuation model. Default is SpringDamperModel. Note that this must be a class, not an instance.

  • goals_in_observation (bool) – If True the desired and achieved goals are included in the observation dictionary. Default True.

  • done_active (bool) – If True, _is_done() returns True if the simulation reaches a success or failure state. If False, _is_done() always returns False and the function calling step() has to figure out when to stop or reset the simulation on its own.

model

The MuJoCo model object.

Type

MjModel

data

The MuJoCo data object.

Type

MjData

init_qpos

The initial position vector for the entire scene. Can be used with set_state() to return the simulation to its initial state.

Type

np.ndarray

init_qvel

The initial velocity vectors for the whole scene. Can be used with set_state() to return the simulation to its initial state.

Type

np.ndarray

frame_skip

The number of simulation substeps for each environment step.

goal

The desired goal.

Type

object

action_space

The action space. See Gym documentation for more.

Type

gym.spaces.Space

observation_space

The observation space. See Gym documentation for more.

Type

gym.spaces.Space

actuation_model

Reference to the actuation model instance.

Type

ActuationModel

proprio_params

The configuration dictionary for the proprioceptive system.

Type

Dict

touch_params

The configuration dictionary for the touch system.

Type

Dict

vision_params

The configuration dictionary for the vision system.

Type

Dict

vestibular_params

The configuration dictionary for the vestibular system.

Type

Dict

proprioception

A reference to the proprioception instance.

Type

Proprioception

touch

A reference to the touch instance.

Type

Touch

vision

A reference to the vision instance.

Type

Vision

vestibular

A reference to the vestibular instance.

Type

Vestibular

facial_expressions

A dictionary linking emotions with their associated facial textures. The keys of this dictionary are valid inputs for swap_facial_expression().

Type

Dict[str, int]

goals_in_observation

If True the desired and achieved goals are included in the observation dictionary. Default True.

Type

bool

done_active

If True, _is_done() returns True if the simulation reaches a success or failure state. If False, _is_done() always returns ``False` and the function calling step() has to figure out when to stop or reset the simulation on its own.

Type

bool

camera_id

The camera, by ID, which will be used to render images.

Type

int

camera_name

The camera, by name, which will be used to render images.

Type

str

render_mode

The render mode for basic calls to render().

Type

str

_initialize_simulation()

Initialize MuJoCo simulation data structures mjModel and mjData.

property n_actuators

The number of actuators for MIMo.

Returns

The number of actuators for MIMo.

Return type

int

_get_actuators()

Saves IDs of the actuators associated with MIMo in mimo_actuators.

_get_joints()

Saves the IDs of the joints associated with MIMO in mimo_joints.

_set_action_space()

Sets the action space attribute.

By default, the actuation space contains only MIMos actuators.

_set_observation_space()

Sets the observation space attribute.

Calls _get_obs() and determines the space using the returned observations.

_get_facial_expressions(emotion_textures)

Associates facial textures in the model with human-readable names for the associated emotions.

Parameters

emotion_textures (Dict[str, str]) – A dictionary with names for emotions as keys and the XML names of the associated facial textures as values.

_env_setup()

This function initializes all the sensory components of the model.

Calls the setup functions for all the sensory components.

_set_initial_position(initial_qpos)

Sets the initial positions for joints in the environment.

The input should be a dictionary with joint names as keys and joint positions (in radians as floats) as values. Thin function then sets each listed joint to the corresponding position. Joints not contained in the dictionary are left unaltered.

Parameters

initial_qpos (dict[str, float]) – A dictionary with joint names as keys and joint positions (in radians as floats) as values.

proprio_setup(proprio_params)

Perform the setup and initialization of the proprioceptive system.

This should be overridden if you want to use another implementation!

Parameters

proprio_params (dict) – The parameter dictionary.

touch_setup(touch_params)

Perform the setup and initialization of the touch system.

This should be overridden if you want to use another implementation!

Parameters

touch_params (dict) – The parameter dictionary.

vision_setup(vision_params)

Perform the setup and initialization of the vision system.

This should be overridden if you want to use another implementation!

Parameters

vision_params (dict) – The parameter dictionary.

vestibular_setup(vestibular_params)

Perform the setup and initialization of the vestibular system.

This should be overridden if you want to use another implementation!

Parameters

vestibular_params (dict) – The parameter dictionary.

_single_mujoco_step()
_set_action(action)

Set the action for the next step.

Calls the actuation models function mimoActuation.actuation.ActuationModel.action(). What exactly happens depends on the specific implementation.

Parameters

action (numpy.ndarray) – A numpy array with control values.

do_simulation(action, n_frames)

Step simulation forward for n_frames number of steps.

Parameters
  • action (np.ndarray) – The control input for the actuators.

  • n_frames (int) – The number of physics steps to perform.

step(action)

Run one timestep of the environment’s dynamics.

This function takes a simulation step with the given control inputs, collects the observations, computes the reward and finally determines if we are done with this episode or not. _get_obs() collects the observations, compute_reward() calculates the reward.`:meth:._is_done is called to determine if we have reached a terminal state and _step_callback() can be used for extra functions each step, such as incrementing a step counter. Both the ‘terminated’ and ‘truncated’ return values are determined by :meth:._is_done`.

Parameters

action (np.ndarray) – An action provided by the agent

Returns

this will be an element of the environment’s observation_space.

This may, for instance, be a numpy array containing the positions and velocities of certain objects.

reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (success or failure as defined under the MDP of the task) is

reached. In this case further step() calls could return undefined results.

truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.

Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.

info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and

logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.

Return type

observation (object)

_step_callback()

A custom callback that is called after stepping the simulation, but before collecting observations.

Useful to enforce additional constraints on the simulation state before observations are collected. Note that the sensory modalities do not update until get_obs is called, so they will not have updated to the current timestep.

_substep_callback()

A custom callback that is called after each simulation substep.

_obs_callback()

A custom callback that is called after collecting the observations.

Like _step_callback, but with up-to-date observations.

_reset_simulation()

Resets MuJoCo and actuation simulation data and samples a new goal.

get_proprio_obs()

Collects and returns the outputs of the proprioceptive system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array containing the proprioceptive output.

Return type

numpy.ndarray

get_touch_obs()

Collects and returns the outputs of the touch system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array containing the touch output.

Return type

numpy.ndarray

get_vision_obs()

Collects and returns the outputs of the vision system.

Override this function if you want to make some simple post-processing!

Returns

A dictionary with one entry for each separate image. In the default implementation each eye renders one image, so each eye gets one entry.

Return type

dict[str, np.ndarray]

get_vestibular_obs()

Collects and returns the outputs of the vestibular system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array with the vestibular data.

Return type

numpy.ndarray

_get_obs()

Returns the observation.

This function should return all simulation outputs relevant to whatever learning algorithm you wish to use. We always return proprioceptive information in the ‘observation’ entry, and this information always includes relative joint positions. Other sensory modalities get their own entries, if they are enabled. If goals_in_observation is set to True, the achieved and desired goal are also included.

Returns

A dictionary containing simulation outputs with separate entries for each sensor modality.

Return type

Dict

swap_facial_expression(emotion)

Changes MIMos facial texture.

Valid emotion names are in facial_expression, which links readable emotion names to their associated texture ids.

Parameters

emotion (str) – A valid emotion name.

_is_done(achieved_goal, desired_goal, info)

This function should determine if we reached the end of an episode. Dummy implementation.

By default, this function always returns False. If done_active is set to True, instead returns True if either is_success() or is_failure() return True. The goal parameters are there to allow this class to be more easily overridden by subclasses, should this be required. They are ignored by default.

Parameters
  • achieved_goal (object) – The goal that was achieved during execution.

  • desired_goal (object) – The desired goal that we asked the agent to attempt to achieve.

  • info (dict) – An info dictionary with additional information.

Returns

Whether the current episode reached a success or failure state. truncated (bool): Whether the current episode entered some kind of invalid condition or “finished” due to

some other constraint, such as a time limit.

Return type

terminated (bool)

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
is_success(achieved_goal, desired_goal)

Indicates if the achieved goal matches the desired goal.

Parameters
  • achieved_goal (object) – The goal that was achieved during execution.

  • desired_goal (object) – The desired goal that we asked the agent to attempt to achieve.

Returns

If we successfully reached the desired goal state.

Return type

bool

is_failure(achieved_goal, desired_goal)

Indicates that we reached a failure state.

Parameters
  • achieved_goal (object) – The goal that was achieved during execution.

  • desired_goal (object) – The desired goal that we asked the agent to attempt to achieve.

Returns

If we reached an unrecoverable failure state.

Return type

bool

is_truncated()

Indicates that we reached an ending condition other than a success or failure state, such as a time limit.

Returns

If we reached some ending condition other than a terminal state.

Return type

bool

reset_model()

This function should reset the simulation state and return observations for the post-reset state.

Returns

The observations after reset.

Return type

Dict

sample_goal()

Should sample a new goal and return it.

Returns

The desired end state.

Return type

object

get_achieved_goal()

Should return the goal that was achieved during the simulation.

Returns

The achieved end state.

Return type

object

compute_reward(achieved_goal, desired_goal, info)

Compute the step reward.

This externalizes the reward function and makes it dependent on a desired goal and the one that was achieved. If you wish to include additional rewards that are independent of the goal, you can include the necessary values to derive it in info and compute it accordingly.

Parameters
  • achieved_goal (object) – the goal that was achieved during execution

  • desired_goal (object) – the desired goal that we asked the agent to attempt to achieve

  • info (dict) – an info dictionary with additional information

Returns

The reward that corresponds to the provided achieved goal w.r.t. to the desired goal. Note that the following should always hold true:

  • ob, reward, done, info = env.step()

  • assert reward == env.compute_reward(ob[‘achieved_goal’], ob[‘desired_goal’], info)

Return type

float

Default data fields

mimoEnv.envs.mimo_env.SCENE_DIRECTORY

Path to the scene directory.

mimoEnv.envs.mimo_env.EMOTES

Valid facial expressions.

mimoEnv.envs.mimo_env.DEFAULT_PROPRIOCEPTION_PARAMS

Default parameters for proprioception. Relative joint positions are always included.

mimoEnv.envs.mimo_env.DEFAULT_TOUCH_PARAMS

Default touch parameters.

mimoEnv.envs.mimo_env.DEFAULT_TOUCH_PARAMS_V2

Default touch parameters for the v2 version of MIMo with five fingers and two toes.

mimoEnv.envs.mimo_env.DEFAULT_VISION_PARAMS

Default vision parameters.

mimoEnv.envs.mimo_env.DEFAULT_VESTIBULAR_PARAMS

Default vestibular parameters.