Sample environments

This section describes the code used for the experiments and demos from our paper MIMo: A Multi-Modal Infant Model for Studying Cognitive Development in Humans and AIs. The learning illustration environments, reach, standup, self-body and catch each involve an environment and a training script using RL algorithms from Stable Baselines3. The catch environment is based on the full hand version of MIMo, while the others use the mitten hand. There is a simple benchmarking scenario in which MIMo takes random actions. Finally there is a demo environment in a simple room with some toys, with all sensory modalities enabled using the default configurations.

All of the the environments register with gym under the names MIMoReach-v0, MIMoStandup-v0, MIMoSelfBody-v0, MIMoCatch-v0, MIMoBench-v0 and MIMoShowroom-v0.

Reach Environment

This module contains a simple reaching experiment in which MIMo tries to touch a hovering ball.

The scene consists of MIMo and a hovering ball located within reach of MIMos right arm. The task is for MIMo to touch the ball. MIMo is fixed in position and can only move his right arm. His head automatically tracks the location of the ball, i.e. the visual search for the ball is assumed. Sensory input consists of the full proprioceptive inputs. All other modalities are disabled.

The ball hovers stationary. An episode is completed successfully if MIMo touches the ball, knocking it out of position. There are no failure states. The position of the ball is slightly randomized each trial.

Reward shaping is employed, with a negative reward based on the distance between MIMos hand and the ball. A large fixed reward is given when he touches the ball.

The class with the environment is MIMoReachEnv while the path to the scene XML is defined in REACH_XML.

mimoEnv.envs.reach.REACH_XML

Path to the reach scene.

class mimoEnv.envs.reach.MIMoReachEnv(model_path=REACH_XML, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=None, vision_params=None, vestibular_params=None, actuation_model=SpringDamperModel, goals_in_observation=False, done_active=True, **kwargs)

Bases: Generic[gymnasium.core.ObsType, gymnasium.core.ActType]

MIMo reaches for an object.

Attributes and parameters are the same as in the base class, but the default arguments are adapted for the scenario.

Due to the goal condition we do not use the goal attribute or the interfaces associated with it. Instead, the reward and success conditions are computed directly from the model state, while sample_goal() and get_achieved_goal() are dummy functions.

compute_reward(achieved_goal, desired_goal, info)

Computes the reward.

A negative reward is given based on the distance between MIMos fingers and the ball. If contact is made a fixed positive reward of 100 is granted. The achieved and desired goal parameters are ignored.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

  • info (dict) – This parameter is ignored.

Returns

The reward as described above.

Return type

float

is_success(achieved_goal, desired_goal)

Determines the goal states.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

True if the ball is knocked out of position.

Return type

bool

is_failure(achieved_goal, desired_goal)

Dummy function. Always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False.

Return type

bool

is_truncated()

Dummy function. Always returns False.

Returns

False

Return type

bool

sample_goal()

Dummy function. Returns an empty array.

Returns

An empty array.

Return type

numpy.ndarray

get_achieved_goal()

Dummy function. Returns an empty array.

Returns

An empty array.

Return type

numpy.ndarray

reset_model()

Resets the simulation.

We reset the simulation and then slightly move both MIMos arm and the ball randomly. The randomization is limited such that MIMo can always reach the ball.

Returns

Observations after reset.

Return type

Dict

_step_callback()

Adjusts the head and eye positions to track the target.

Manually computes the joint positions required for the head and eyes to look at the target objects.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]

Standup Environment

This module contains a simple reaching experiment in which MIMo tries to stand up.

The scene consists of MIMo and some railings representing a crib. MIMo starts sitting on the ground with his hands on the railings. The task is to stand up. MIMos feet and hands are welded to the ground and railings, respectively. He can move all joints in his arms, legs and torso. His head is fixed. Sensory input consists of proprioceptive and vestibular inputs, using the default configurations for both.

MIMo initial position is determined by slightly randomizing all joint positions from a standing position and then letting the simulation settle. This leads to MIMo sagging into a slightly random crouching or sitting position each episode. All episodes have a fixed length, there are no goal or failure states.

Reward shaping is employed, such that MIMo is penalised for using muscle inputs and large inputs in particular. Additionally, he is rewarded each step for the current height of his head.

The class with the environment is MIMoStandupEnv while the path to the scene XML is defined in STANDUP_XML.

mimoEnv.envs.standup.STANDUP_XML

Path to the stand up scene.

mimoEnv.envs.standup.SITTING_POSITION

Initial position of MIMo. Specifies initial values for all joints. We grabbed these values by posing MIMo using the MuJoCo simulate executable and the positional actuator file. We need these not just for the initial position but also resetting the position each step.

class mimoEnv.envs.standup.MIMoStandupEnv(model_path=STANDUP_XML, initial_qpos=SITTING_POSITION, frame_skip=2, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=None, vision_params=None, vestibular_params=DEFAULT_VESTIBULAR_PARAMS, actuation_model=SpringDamperModel)

Bases: Generic[gymnasium.core.ObsType, gymnasium.core.ActType]

MIMo stands up using crib railings as an aid.

Attributes and parameters are the same as in the base class, but the default arguments are adapted for the scenario. Specifically we have done_active and goals_in_observation as False and touch and vision sensors disabled.

Even though we define a success condition in _is_success(), it is disabled since done_active is set to False. The purpose of this is to enable extra information for the logging features of stable baselines.

init_crouch_position

The initial position.

Type

numpy.ndarray

compute_reward(achieved_goal, desired_goal, info)

Computes the reward.

The reward consists of the current height of MIMos head with a penalty of the square of the control signal. :param achieved_goal: The achieved head height. :type achieved_goal: float :param desired_goal: This parameter is ignored. :type desired_goal: float :param info: This parameter is ignored. :type info: dict

Returns

The reward as described above.

Return type

float

is_success(achieved_goal, desired_goal)

Did we reach our goal height.

Parameters
  • achieved_goal (float) – The achieved head height.

  • desired_goal (float) – This target head height.

Returns

If the achieved head height exceeds the desired height.

Return type

bool

reset_model()

Resets the simulation.

Return the simulation to the XML state, then slightly randomize all joint positions. Afterwards we let the simulation settle for a fixed number of steps. This leads to MIMo settling into a slightly random sitting or crouching position.

Returns

Observations after reset.

Return type

Dict

is_failure(achieved_goal, desired_goal)

Dummy function. Always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False

Return type

bool

is_truncated()

Dummy function. Always returns False.

Returns

False.

Return type

bool

sample_goal()

Returns the goal height.

We use a fixed goal height of 0.5.

Returns

0.5

Return type

float

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
get_achieved_goal()

Get the height of MIMos head.

Returns

The height of MIMos head.

Return type

float

Self-body Environment

This module contains a simple experiment where MIMo is tasked with touching parts of his own body.

The scene is empty except for MIMo, who is sitting on the ground. The task is for MIMo to touch a randomized target body part with his right arm. MIMo is fixed in the initial sitting position and can only move his right arm. Sensory inputs consist of touch and proprioception. Proprioception uses the default settings, but touch excludes several body parts and uses a lowered resolution to improve runtime. The body part can be any of the geoms constituting MIMo.

MIMos initial position is constant in all episodes. The target body part is randomized. An episode is completed successfully if MIMo touches the target body part with his right arm.

The reward structure consists of a large fixed reward for touching the right body part, a shaping reward for touching another body part, depending on the distance between the contact and the target body part, and a penalty for each time step.

The class with the environment is MIMoSelfBodyEnv while the path to the scene XML is defined in SELFBODY_XML.

mimoEnv.envs.selfbody.TOUCH_PARAMS

List of possible target bodies.

mimoEnv.envs.selfbody.SITTING_POSITION

Initial position of MIMo. Specifies initial values for all joints. We grabbed these values by posing MIMo using the MuJoCo simulate executable and the positional actuator file. We need these not just for the initial position but also resetting the position (excluding the right arm) each step.

mimoEnv.envs.selfbody.SELFBODY_XML

Path to the scene for this experiment.

class mimoEnv.envs.selfbody.MIMoSelfBodyEnv(model_path=SELFBODY_XML, initial_qpos=SITTING_POSITION, frame_skip=1, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=TOUCH_PARAMS, vision_params=None, vestibular_params=None, actuation_model=SpringDamperModel, goals_in_observation=True, done_active=True, **kwargs)

Bases: Generic[gymnasium.core.ObsType, gymnasium.core.ActType]

MIMo learns about his own body.

MIMo is tasked with touching a given part of his body using his right arm. Attributes and parameters are mostly identical to the base class, but there are two changes. The constructor takes two arguments less, goals_in_observation and done_active, which are both permanently set to True. Finally, there are two extra attributes for handling the goal state. The goal attribute stores the target geom in a one hot encoding, while target_geom and target_body store the geom and its associated body as an index. For more information on geoms and bodies please see the MuJoCo documentation.

target_geom

The body part MIMo should try to touch, as a MuJoCo geom.

Type

int

target_body

The name of the kinematic body that the target geom is a part of.

Type

str

init_sitting_qpos

The initial position.

Type

numpy.ndarray

sample_goal()

Samples a new goal and returns it.

The goal consists of a target geom that we try to touch, returned as a one-hot encoding. We also populate target_geom and target_body. which are used by other functions.

Returns

The target geom in a one hot encoding.

Return type

numpy.ndarray

is_success(achieved_goal, desired_goal)

We have succeeded when we have a touch sensation on the goal body.

We ignore the goal attribute in this for performance reasons and determine the success condition using target_geom instead. This allows us to save a number of array operations each step.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

If MIMo has touched the target geom.

Return type

bool

compute_reward(achieved_goal, desired_goal, info)

Computes the reward each step.

Three different rewards can be returned:

  • If we touched the target geom, the reward is 500.

  • If we touched a geom, but not the target, the reward is the negative of the distance between the touch contact and the target body.

  • Otherwise the reward is -1.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

  • info (dict) – This parameter is ignored.

Returns

The reward as described above.

Return type

float

reset_model()

Reset to the initial sitting position.

Returns

Observations after reset.

Return type

Dict

is_failure(achieved_goal, desired_goal)

Dummy function that always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False.

Return type

bool

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
is_truncated()

Dummy function. Always returns False.

Returns

False.

Return type

bool

get_achieved_goal()

Dummy function that returns an empty array.

Returns

An empty array.

Return type

numpy.ndarray

Self-body Environment

This module contains a simple reaching experiment in which MIMo tries to catch a falling ball.

The scene consists of MIMo with his right arm outstretched and his palm open. A ball is located just above MIMos palm. The task is for him to catch the falling ball. MIMo is fixed in position and can only move his right hand. Sensory input consists of the full proprioceptive inputs and touch input.

An episode is completed successfully if MIMo holds onto the ball continuously for 1 second. An episode fails when the ball drops some distance below MIMos hand or is bounced into the distance.

There is a small negative reward for each step without touching the ball, a larger positive reward for each step in contact with the ball and then a large fixed reward on success.

mimoEnv.envs.catch.CATCH_XML

Path to the reach scene.

mimoEnv.envs.catch.TOUCH_PARAMS

Touch parameters for the catch environment. Only the right arm is equipped with sensors.

mimoEnv.envs.catch.CATCH_CAMERA_CONFIG

Camera configuration so it looks straight at the hand.

class mimoEnv.envs.catch.MIMoCatchEnv(model_path=CATCH_XML, initial_qpos=None, frame_skip=2, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=TOUCH_PARAMS, vision_params=None, vestibular_params=None, actuation_model=MuscleModel, goals_in_observation=False, done_active=True, action_penalty=True, jitter=False, position_inaccurate=False, default_camera_config=CATCH_CAMERA_CONFIG, **kwargs)

Bases: Generic[gymnasium.core.ObsType, gymnasium.core.ActType]

MIMo tries to catch a falling ball.

MIMo is tasked with catching a falling ball and holding onto it for one second. MIMo’s head and eyes automatically track the ball. The position of the ball is slightly randomized each episode. The constructor takes three additional arguments over the base environment.

Parameters
  • action_penalty (bool) – If True, an action penalty based on the cost function of the actuation model is applied to the reward. Default True.

  • jitter (bool) – If True, the input actions are multiplied with a perturbation array which is randomized every 10-50 time steps. Default False.

  • position_inaccurate (bool) – If True, the position tracked by the head is offset by a small random distance from the true position of the ball. Default False.

action_penalty

If True, an action penalty based on the cost function of the actuation model is applied to the reward. Default True.

Type

bool

jitter

If True, the input actions are multiplied with a perturbation array which is randomized every 10-50 time steps. Default False.

Type

bool

use_position_inaccuracy

If True, the position tracked by the head is offset by a small random distance from the true position of the ball. Default False.

Type

bool

position_limits

Maximum distances away from the default ball position for the randomization.

Type

np.ndarray

position_inaccuracy_limits

Maximum distances for the head tracking offset.

Type

np.ndarray

position_offset

The actual inaccuracy of the head tracking. This is randomized each episode.

Type

np.ndarray

size_limits

Minimum and maximum size of the ball.

Type

Tuple[float, float]

ball_size

Current ball size. Changes each episode.

Type

float

mass_limits

Minimum and maximum mass of the ball.

Type

Tuple[float, float]

ball_mass

Current ball mass. Changes each episode.

Type

float

jitter_array

Control inputs are multiplied by this array before being passed to MuJoCo. This is randomized every so often.

Type

np.ndarray

jitter_period

The number of steps the current jitter array is used for before being randomized again.

Type

int

steps_in_contact_for_success

For how many steps MIMo must hold onto the ball.

Type

int

in_contact_past

A list storing which past steps we were in contact for. This list works by modulo, i.e. to determine if MIMo held the ball on step i, do in_contact_past[i % steps_in_contact_for_success].

Type

List[bool]

compute_reward(achieved_goal, desired_goal, info)

Computes the reward.

MIMo is rewarded for each time step in contact with the target. Completing an episode successfully awards +100, while failing leads to a -100 penalty. Additionally, there is an action penalty based on the cost function of the actuation model.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

  • info (dict) – This parameter is ignored.

Returns

The reward as described above.

Return type

float

do_simulation(action, n_frames)

Implementation that adds jitter to the actions.

_get_obs()

Adds the size of the ball to the observations.

Returns

The altered observation dictionary.

Return type

Dict

is_success(achieved_goal, desired_goal)

Returns true if MIMo touches the object continuously for 1 second.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

True if MIMo has been touching the ball for the last second, False otherwise.

Return type

bool

is_failure(achieved_goal, desired_goal)

Returns True if the ball drops below MIMo’s hand.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

True if the ball drops below MIMo’s hand, False otherwise.

Return type

bool

is_truncated()

Dummy function.

Returns

Always returns False.

Return type

bool

sample_goal()

Dummy function. Returns an empty array.

Returns

An empty array.

Return type

numpy.ndarray

get_achieved_goal()

Dummy function. Returns an empty array.

Returns

An empty array.

Return type

numpy.ndarray

reset_model()

Resets the simulation.

We reset the simulation and then slightly move both MIMos arm and the ball randomly. The randomization is limited such that MIMo can always reach the ball.

Returns

Always returns True.

Return type

bool

_step_callback()

Checks if MIMo is touching the ball and performs head tracking.

_in_contact()

Check if MIMo is currently touching the target ball.

This function performs the actual contact check and is called during step_callback().

Returns

True if MIMo is currently touching the ball, False otherwise..

Return type

bool

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
body_contact_reward()

Reward function that provides higher rewards the more geoms are touching the target.

Returns

The reward component as described above.

Return type

float

_currently_in_contact()

Check if MIMo is currently touching the ball.

Unlike _in_contact() this function does not perform the check itself, instead checking the array of past contacts for the current time step. The output of this function will not be accurate if called before _in_contact()!

Returns

True if MIMo is currently touching the ball, False otherwise.

Return type

bool

Training script

There is also a training script for all the sample environments.

Training script for the demonstration experiments.

This script allows simple training and testing of RL algorithms in the demo environments with a command line interface. A selection of RL algorithms from the Stable Baselines3 library can be selected. Interactive rendering is disabled during training to speed up computation, but enabled during testing, so the behaviour of the model can be observed directly.

Trained models are saved into the “models/<scenario>” directory, i.e. if you train a reach model and name it “my_model”, it will be saved under “models/reach/my_model”.

To train a given algorithm for some number of time steps:

python illustrations.py --env=reach --train_for=200000 --test_for=1000 --algorithm=PPO --save_model=<model_suffix>

To review a trained model:

python illustrations.py --env=reach --test_for=1000 --load_model=<your_model_suffix>

The available algorithms are PPO, SAC, TD3, DDPG, A2C.

mimoEnv.illustrations.test(env, save_dir, test_for=1000, model=None, render_video=False)

Testing function to view the behaviour of a model.

Parameters
  • env (MIMoEnv) – The environment on which the model should be tested. This does not have to be the same training environment, but action and observation spaces must match.

  • save_dir (str) – The directory in which any rendered videos will be saved.

  • test_for (int) – The number of timesteps the testing runs in total. This will be broken into multiple episodes if necessary.

  • model – The stable baselines model object. If None we take random actions instead. Default None.

  • render_video (bool) – If True, all episodes during testing will be recorded and saved as videos in save_dir.

mimoEnv.illustrations.main()

CLI for the demonstration environments.

Command line interface that can train and load models for the standup scenario. Possible parameters are:

  • --env: The demonstration environment to use. Must be one of reach, standup, selfbody, catch.

  • --train_for: The number of time steps to train. No training takes place if this is 0. Default 0.

  • --test_for: The number of time steps to test. Testing renders the environment to an interactive window, so the trained behaviour can be observed. Default 1000.

  • --save_every: The number of time steps between model saves. This can be larger than the total training time, in which case we save once when training completes. Default 100000.

  • --algorithm: The algorithm to train. This argument must be provided if you train. Must be one of PPO, SAC, TD3, DDPG, A2C, HER.

  • --load_model: The path to the model to load.

  • --save_model: The directory name where the trained model will be saved. An input of “my_model”, will lead to

    the model being saved under “models/<env>/my_model”.

  • --use_muscles: This flag switches between actuation models. By default, the spring-damper model is used. If

    this flag is set, the muscle model is used instead.

  • --render_video: If this flag is set, each testing episode is recorded and saved as a video in the same

    directory as the models.

Benchmarking

This script and the demo script use the same dummy class, but with different scene XMLs. For benchmarking the scene consisted of MIMo with all sensory modalities enabled with varying configurations and a couple of objects lying on the ground. In the benchmarking script we take random actions after each step.

Environments

This module defines a dummy implementation for MIMo, to allow easy testing of modules.

The main class is MIMoDummyEnv which implements all methods from the base class as dummy functions that returned fixed values. This allows for testing the model without the full gym bureaucracy. The second class MIMoShowroomEnv is identical to the first, but changes the default parameters to load the showroom scene instead.

Finally, there is a demo class for the v2 version of MIMo using five-fingered hands and feet with two toes each in MIMoV2DummyEnv.

mimoEnv.envs.dummy.DEMO_XML

Path to the demo scene.

mimoEnv.envs.dummy.BENCHMARK_XML

Path to the benchmarking scene.

mimoEnv.envs.dummy.BENCHMARK_XML_V2

Path to the benchmarking scene using MIMo v2.

mimoEnv.envs.dummy.TEST_XML

Path to the benchmarking scene using MIMo v2.

class mimoEnv.envs.dummy.MIMoDummyEnv(model_path=BENCHMARK_XML, frame_skip=2, initial_qpos=None, render_mode=None, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=DEFAULT_TOUCH_PARAMS, vision_params=DEFAULT_VISION_PARAMS, vestibular_params=DEFAULT_VESTIBULAR_PARAMS, actuation_model=SpringDamperModel, goals_in_observation=False, done_active=True, show_sensors=False, print_space_sizes=False, **kwargs)

Bases: Generic[gymnasium.core.ObsType, gymnasium.core.ActType]

Dummy implementation for MIMoEnv.

This class is meant for testing and demonstrating parts of the base class. All abstract methods are implemented as dummy functions that return fixed values. No meaningful goal or reward is specified. The default parameters use the default sensor configurations in a bare scene consisting of MIMo and two objects on an infinite plane. For testing and validation there are two additional parameters compared to the base class:

Parameters
  • show_sensors – If True, plot the sensor point distribution for the touch system during initialization. Default False.

  • print_space_sizes – If True, the shape of the action space and all entries in the observation dictionary is printed during initialization. Default False.

Finally, there are two extra attributes:

steps

A step counter.

Type

int

show_sensors

If True, plot the sensor point distribution for the touch system during initialization.

Type

bool

touch_setup(touch_params)

Perform the setup and initialization of the touch system.

Uses the more complicated Trimesh implementation. Also plots the sensor points if show_sensors is True.

Parameters

touch_params (dict) – The parameter dictionary.

_obs_callback()

Simply increments the step counter.

reset_model()

Resets to the initial simulation state

is_success(achieved_goal, desired_goal)

Dummy function that always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False.

Return type

bool

is_failure(achieved_goal, desired_goal)

Dummy function that always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False.

Return type

bool

is_truncated()

Dummy function. Always returns False.

Returns

False.

Return type

bool

sample_goal()

A dummy function returning an empty array of shape (0,).

Returns

An empty size 0 array.

Return type

numpy.ndarray

get_achieved_goal()

Dummy function returning an empty array with the same shape as the goal.

Returns

An empty size 0 array.

Return type

numpy.ndarray

compute_reward(achieved_goal, desired_goal, info)

Dummy function that always returns a dummy value of 0.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

  • info (dict) – This parameter is ignored.

Returns

0

Return type

float

_env_setup()

This function initializes all the sensory components of the model.

Calls the setup functions for all the sensory components.

_get_actuators()

Saves IDs of the actuators associated with MIMo in mimo_actuators.

_get_facial_expressions(emotion_textures)

Associates facial textures in the model with human-readable names for the associated emotions.

Parameters

emotion_textures (Dict[str, str]) – A dictionary with names for emotions as keys and the XML names of the associated facial textures as values.

_get_joints()

Saves the IDs of the joints associated with MIMO in mimo_joints.

_get_obs()

Returns the observation.

This function should return all simulation outputs relevant to whatever learning algorithm you wish to use. We always return proprioceptive information in the ‘observation’ entry, and this information always includes relative joint positions. Other sensory modalities get their own entries, if they are enabled. If goals_in_observation is set to True, the achieved and desired goal are also included.

Returns

A dictionary containing simulation outputs with separate entries for each sensor modality.

Return type

Dict

_initialize_simulation()

Initialize MuJoCo simulation data structures mjModel and mjData.

_is_done(achieved_goal, desired_goal, info)

This function should determine if we reached the end of an episode. Dummy implementation.

By default, this function always returns False. If done_active is set to True, instead returns True if either is_success() or is_failure() return True. The goal parameters are there to allow this class to be more easily overridden by subclasses, should this be required. They are ignored by default.

Parameters
  • achieved_goal (object) – The goal that was achieved during execution.

  • desired_goal (object) – The desired goal that we asked the agent to attempt to achieve.

  • info (dict) – An info dictionary with additional information.

Returns

Whether the current episode reached a success or failure state. truncated (bool): Whether the current episode entered some kind of invalid condition or “finished” due to

some other constraint, such as a time limit.

Return type

terminated (bool)

_np_random: np.random.Generator | None = None
_reset_simulation()

Resets MuJoCo and actuation simulation data and samples a new goal.

_set_action(action)

Set the action for the next step.

Calls the actuation models function mimoActuation.actuation.ActuationModel.action(). What exactly happens depends on the specific implementation.

Parameters

action (numpy.ndarray) – A numpy array with control values.

_set_action_space()

Sets the action space attribute.

By default, the actuation space contains only MIMos actuators.

_set_initial_position(initial_qpos)

Sets the initial positions for joints in the environment.

The input should be a dictionary with joint names as keys and joint positions (in radians as floats) as values. Thin function then sets each listed joint to the corresponding position. Joints not contained in the dictionary are left unaltered.

Parameters

initial_qpos (dict[str, float]) – A dictionary with joint names as keys and joint positions (in radians as floats) as values.

_set_observation_space()

Sets the observation space attribute.

Calls _get_obs() and determines the space using the returned observations.

_single_mujoco_step()
_step_callback()

A custom callback that is called after stepping the simulation, but before collecting observations.

Useful to enforce additional constraints on the simulation state before observations are collected. Note that the sensory modalities do not update until get_obs is called, so they will not have updated to the current timestep.

_step_mujoco_simulation(ctrl, n_frames)

Step over the MuJoCo simulation.

_substep_callback()

A custom callback that is called after each simulation substep.

close()

Close all processes like rendering contexts

do_simulation(action, n_frames)

Step simulation forward for n_frames number of steps.

Parameters
  • action (np.ndarray) – The control input for the actuators.

  • n_frames (int) – The number of physics steps to perform.

property dt
get_body_com(body_name)

Return the cartesian position of a body frame

get_proprio_obs()

Collects and returns the outputs of the proprioceptive system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array containing the proprioceptive output.

Return type

numpy.ndarray

get_touch_obs()

Collects and returns the outputs of the touch system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array containing the touch output.

Return type

numpy.ndarray

get_vestibular_obs()

Collects and returns the outputs of the vestibular system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array with the vestibular data.

Return type

numpy.ndarray

get_vision_obs()

Collects and returns the outputs of the vision system.

Override this function if you want to make some simple post-processing!

Returns

A dictionary with one entry for each separate image. In the default implementation each eye renders one image, so each eye gets one entry.

Return type

dict[str, np.ndarray]

metadata: dict[str, Any] = {'render_modes': []}
property n_actuators

The number of actuators for MIMo.

Returns

The number of actuators for MIMo.

Return type

int

property np_random: numpy.random._generator.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns

Instances of np.random.Generator

proprio_setup(proprio_params)

Perform the setup and initialization of the proprioceptive system.

This should be overridden if you want to use another implementation!

Parameters

proprio_params (dict) – The parameter dictionary.

render()

Render a frame from the MuJoCo simulation as specified by the render_mode.

render_mode: str | None = None
reset(*, seed: Optional[int] = None, options: Optional[dict] = None)

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset.

Therefore, reset() should (in the typical use case) be called with a seed right after initialization and then never again.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Changed in version v0.25: The return_info parameter was removed and now info is expected to be returned.

Parameters
  • seed (optional int) – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.

  • options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)

Returns

Observation of the initial state. This will be an element of observation_space

(typically a numpy array) and is analogous to the observation returned by step().

info (dictionary): This dictionary contains auxiliary information complementing observation. It should be analogous to

the info returned by step().

Return type

observation (ObsType)

reward_range = (-inf, inf)
set_state(qpos, qvel)

Set the joints position qpos and velocity qvel of the model. Override this method depending on the MuJoCo bindings used.

spec: EnvSpec | None = None
state_vector()

Return the position and velocity joint states of the model

step(action)

Run one timestep of the environment’s dynamics.

This function takes a simulation step with the given control inputs, collects the observations, computes the reward and finally determines if we are done with this episode or not. _get_obs() collects the observations, compute_reward() calculates the reward.`:meth:._is_done is called to determine if we have reached a terminal state and _step_callback() can be used for extra functions each step, such as incrementing a step counter. Both the ‘terminated’ and ‘truncated’ return values are determined by :meth:._is_done`.

Parameters

action (np.ndarray) – An action provided by the agent

Returns

this will be an element of the environment’s observation_space.

This may, for instance, be a numpy array containing the positions and velocities of certain objects.

reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (success or failure as defined under the MDP of the task) is

reached. In this case further step() calls could return undefined results.

truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.

Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.

info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and

logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.

Return type

observation (object)

swap_facial_expression(emotion)

Changes MIMos facial texture.

Valid emotion names are in facial_expression, which links readable emotion names to their associated texture ids.

Parameters

emotion (str) – A valid emotion name.

property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]

Returns the base non-wrapped environment.

Returns

The base non-wrapped gymnasium.Env instance

Return type

Env

vestibular_setup(vestibular_params)

Perform the setup and initialization of the vestibular system.

This should be overridden if you want to use another implementation!

Parameters

vestibular_params (dict) – The parameter dictionary.

vision_setup(vision_params)

Perform the setup and initialization of the vision system.

This should be overridden if you want to use another implementation!

Parameters

vision_params (dict) – The parameter dictionary.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
class mimoEnv.envs.dummy.MIMoV2DummyEnv(model_path=BENCHMARK_XML_V2, touch_params=DEFAULT_TOUCH_PARAMS_V2, **kwargs)

Bases: Generic[gymnasium.core.ObsType, gymnasium.core.ActType]

Same as MIMoDummyEnv, but using the full hand version of MIMo which has hands with five fingers and feet with two toes.

_env_setup()

This function initializes all the sensory components of the model.

Calls the setup functions for all the sensory components.

_get_actuators()

Saves IDs of the actuators associated with MIMo in mimo_actuators.

_get_facial_expressions(emotion_textures)

Associates facial textures in the model with human-readable names for the associated emotions.

Parameters

emotion_textures (Dict[str, str]) – A dictionary with names for emotions as keys and the XML names of the associated facial textures as values.

_get_joints()

Saves the IDs of the joints associated with MIMO in mimo_joints.

_get_obs()

Returns the observation.

This function should return all simulation outputs relevant to whatever learning algorithm you wish to use. We always return proprioceptive information in the ‘observation’ entry, and this information always includes relative joint positions. Other sensory modalities get their own entries, if they are enabled. If goals_in_observation is set to True, the achieved and desired goal are also included.

Returns

A dictionary containing simulation outputs with separate entries for each sensor modality.

Return type

Dict

_initialize_simulation()

Initialize MuJoCo simulation data structures mjModel and mjData.

_is_done(achieved_goal, desired_goal, info)

This function should determine if we reached the end of an episode. Dummy implementation.

By default, this function always returns False. If done_active is set to True, instead returns True if either is_success() or is_failure() return True. The goal parameters are there to allow this class to be more easily overridden by subclasses, should this be required. They are ignored by default.

Parameters
  • achieved_goal (object) – The goal that was achieved during execution.

  • desired_goal (object) – The desired goal that we asked the agent to attempt to achieve.

  • info (dict) – An info dictionary with additional information.

Returns

Whether the current episode reached a success or failure state. truncated (bool): Whether the current episode entered some kind of invalid condition or “finished” due to

some other constraint, such as a time limit.

Return type

terminated (bool)

_np_random: np.random.Generator | None = None
_obs_callback()

Simply increments the step counter.

_reset_simulation()

Resets MuJoCo and actuation simulation data and samples a new goal.

_set_action(action)

Set the action for the next step.

Calls the actuation models function mimoActuation.actuation.ActuationModel.action(). What exactly happens depends on the specific implementation.

Parameters

action (numpy.ndarray) – A numpy array with control values.

_set_action_space()

Sets the action space attribute.

By default, the actuation space contains only MIMos actuators.

_set_initial_position(initial_qpos)

Sets the initial positions for joints in the environment.

The input should be a dictionary with joint names as keys and joint positions (in radians as floats) as values. Thin function then sets each listed joint to the corresponding position. Joints not contained in the dictionary are left unaltered.

Parameters

initial_qpos (dict[str, float]) – A dictionary with joint names as keys and joint positions (in radians as floats) as values.

_set_observation_space()

Sets the observation space attribute.

Calls _get_obs() and determines the space using the returned observations.

_single_mujoco_step()
_step_callback()

A custom callback that is called after stepping the simulation, but before collecting observations.

Useful to enforce additional constraints on the simulation state before observations are collected. Note that the sensory modalities do not update until get_obs is called, so they will not have updated to the current timestep.

_step_mujoco_simulation(ctrl, n_frames)

Step over the MuJoCo simulation.

_substep_callback()

A custom callback that is called after each simulation substep.

close()

Close all processes like rendering contexts

compute_reward(achieved_goal, desired_goal, info)

Dummy function that always returns a dummy value of 0.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

  • info (dict) – This parameter is ignored.

Returns

0

Return type

float

do_simulation(action, n_frames)

Step simulation forward for n_frames number of steps.

Parameters
  • action (np.ndarray) – The control input for the actuators.

  • n_frames (int) – The number of physics steps to perform.

property dt
get_achieved_goal()

Dummy function returning an empty array with the same shape as the goal.

Returns

An empty size 0 array.

Return type

numpy.ndarray

get_body_com(body_name)

Return the cartesian position of a body frame

get_proprio_obs()

Collects and returns the outputs of the proprioceptive system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array containing the proprioceptive output.

Return type

numpy.ndarray

get_touch_obs()

Collects and returns the outputs of the touch system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array containing the touch output.

Return type

numpy.ndarray

get_vestibular_obs()

Collects and returns the outputs of the vestibular system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array with the vestibular data.

Return type

numpy.ndarray

get_vision_obs()

Collects and returns the outputs of the vision system.

Override this function if you want to make some simple post-processing!

Returns

A dictionary with one entry for each separate image. In the default implementation each eye renders one image, so each eye gets one entry.

Return type

dict[str, np.ndarray]

is_failure(achieved_goal, desired_goal)

Dummy function that always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False.

Return type

bool

is_success(achieved_goal, desired_goal)

Dummy function that always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False.

Return type

bool

is_truncated()

Dummy function. Always returns False.

Returns

False.

Return type

bool

metadata: dict[str, Any] = {'render_modes': []}
property n_actuators

The number of actuators for MIMo.

Returns

The number of actuators for MIMo.

Return type

int

property np_random: numpy.random._generator.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns

Instances of np.random.Generator

proprio_setup(proprio_params)

Perform the setup and initialization of the proprioceptive system.

This should be overridden if you want to use another implementation!

Parameters

proprio_params (dict) – The parameter dictionary.

render()

Render a frame from the MuJoCo simulation as specified by the render_mode.

render_mode: str | None = None
reset(*, seed: Optional[int] = None, options: Optional[dict] = None)

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset.

Therefore, reset() should (in the typical use case) be called with a seed right after initialization and then never again.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Changed in version v0.25: The return_info parameter was removed and now info is expected to be returned.

Parameters
  • seed (optional int) – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.

  • options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)

Returns

Observation of the initial state. This will be an element of observation_space

(typically a numpy array) and is analogous to the observation returned by step().

info (dictionary): This dictionary contains auxiliary information complementing observation. It should be analogous to

the info returned by step().

Return type

observation (ObsType)

reset_model()

Resets to the initial simulation state

reward_range = (-inf, inf)
sample_goal()

A dummy function returning an empty array of shape (0,).

Returns

An empty size 0 array.

Return type

numpy.ndarray

set_state(qpos, qvel)

Set the joints position qpos and velocity qvel of the model. Override this method depending on the MuJoCo bindings used.

spec: EnvSpec | None = None
state_vector()

Return the position and velocity joint states of the model

step(action)

Run one timestep of the environment’s dynamics.

This function takes a simulation step with the given control inputs, collects the observations, computes the reward and finally determines if we are done with this episode or not. _get_obs() collects the observations, compute_reward() calculates the reward.`:meth:._is_done is called to determine if we have reached a terminal state and _step_callback() can be used for extra functions each step, such as incrementing a step counter. Both the ‘terminated’ and ‘truncated’ return values are determined by :meth:._is_done`.

Parameters

action (np.ndarray) – An action provided by the agent

Returns

this will be an element of the environment’s observation_space.

This may, for instance, be a numpy array containing the positions and velocities of certain objects.

reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (success or failure as defined under the MDP of the task) is

reached. In this case further step() calls could return undefined results.

truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.

Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.

info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and

logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.

Return type

observation (object)

swap_facial_expression(emotion)

Changes MIMos facial texture.

Valid emotion names are in facial_expression, which links readable emotion names to their associated texture ids.

Parameters

emotion (str) – A valid emotion name.

touch_setup(touch_params)

Perform the setup and initialization of the touch system.

Uses the more complicated Trimesh implementation. Also plots the sensor points if show_sensors is True.

Parameters

touch_params (dict) – The parameter dictionary.

property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]

Returns the base non-wrapped environment.

Returns

The base non-wrapped gymnasium.Env instance

Return type

Env

vestibular_setup(vestibular_params)

Perform the setup and initialization of the vestibular system.

This should be overridden if you want to use another implementation!

Parameters

vestibular_params (dict) – The parameter dictionary.

vision_setup(vision_params)

Perform the setup and initialization of the vision system.

This should be overridden if you want to use another implementation!

Parameters

vision_params (dict) – The parameter dictionary.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]
class mimoEnv.envs.dummy.MIMoMuscleDummyEnv(model_path=BENCHMARK_XML_V2, touch_params=DEFAULT_TOUCH_PARAMS_V2, actuation_model=MuscleModel, **kwargs)

Bases: Generic[gymnasium.core.ObsType, gymnasium.core.ActType]

Same as MIMoDummyEnv, but using the muscle actuation model. Uses the full hand version of MIMo by default.

_env_setup()

This function initializes all the sensory components of the model.

Calls the setup functions for all the sensory components.

_get_actuators()

Saves IDs of the actuators associated with MIMo in mimo_actuators.

_get_facial_expressions(emotion_textures)

Associates facial textures in the model with human-readable names for the associated emotions.

Parameters

emotion_textures (Dict[str, str]) – A dictionary with names for emotions as keys and the XML names of the associated facial textures as values.

_get_joints()

Saves the IDs of the joints associated with MIMO in mimo_joints.

_get_obs()

Returns the observation.

This function should return all simulation outputs relevant to whatever learning algorithm you wish to use. We always return proprioceptive information in the ‘observation’ entry, and this information always includes relative joint positions. Other sensory modalities get their own entries, if they are enabled. If goals_in_observation is set to True, the achieved and desired goal are also included.

Returns

A dictionary containing simulation outputs with separate entries for each sensor modality.

Return type

Dict

_initialize_simulation()

Initialize MuJoCo simulation data structures mjModel and mjData.

_is_done(achieved_goal, desired_goal, info)

This function should determine if we reached the end of an episode. Dummy implementation.

By default, this function always returns False. If done_active is set to True, instead returns True if either is_success() or is_failure() return True. The goal parameters are there to allow this class to be more easily overridden by subclasses, should this be required. They are ignored by default.

Parameters
  • achieved_goal (object) – The goal that was achieved during execution.

  • desired_goal (object) – The desired goal that we asked the agent to attempt to achieve.

  • info (dict) – An info dictionary with additional information.

Returns

Whether the current episode reached a success or failure state. truncated (bool): Whether the current episode entered some kind of invalid condition or “finished” due to

some other constraint, such as a time limit.

Return type

terminated (bool)

_np_random: np.random.Generator | None = None
_obs_callback()

Simply increments the step counter.

_reset_simulation()

Resets MuJoCo and actuation simulation data and samples a new goal.

_set_action(action)

Set the action for the next step.

Calls the actuation models function mimoActuation.actuation.ActuationModel.action(). What exactly happens depends on the specific implementation.

Parameters

action (numpy.ndarray) – A numpy array with control values.

_set_action_space()

Sets the action space attribute.

By default, the actuation space contains only MIMos actuators.

_set_initial_position(initial_qpos)

Sets the initial positions for joints in the environment.

The input should be a dictionary with joint names as keys and joint positions (in radians as floats) as values. Thin function then sets each listed joint to the corresponding position. Joints not contained in the dictionary are left unaltered.

Parameters

initial_qpos (dict[str, float]) – A dictionary with joint names as keys and joint positions (in radians as floats) as values.

_set_observation_space()

Sets the observation space attribute.

Calls _get_obs() and determines the space using the returned observations.

_single_mujoco_step()
_step_callback()

A custom callback that is called after stepping the simulation, but before collecting observations.

Useful to enforce additional constraints on the simulation state before observations are collected. Note that the sensory modalities do not update until get_obs is called, so they will not have updated to the current timestep.

_step_mujoco_simulation(ctrl, n_frames)

Step over the MuJoCo simulation.

_substep_callback()

A custom callback that is called after each simulation substep.

close()

Close all processes like rendering contexts

compute_reward(achieved_goal, desired_goal, info)

Dummy function that always returns a dummy value of 0.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

  • info (dict) – This parameter is ignored.

Returns

0

Return type

float

do_simulation(action, n_frames)

Step simulation forward for n_frames number of steps.

Parameters
  • action (np.ndarray) – The control input for the actuators.

  • n_frames (int) – The number of physics steps to perform.

property dt
get_achieved_goal()

Dummy function returning an empty array with the same shape as the goal.

Returns

An empty size 0 array.

Return type

numpy.ndarray

get_body_com(body_name)

Return the cartesian position of a body frame

get_proprio_obs()

Collects and returns the outputs of the proprioceptive system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array containing the proprioceptive output.

Return type

numpy.ndarray

get_touch_obs()

Collects and returns the outputs of the touch system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array containing the touch output.

Return type

numpy.ndarray

get_vestibular_obs()

Collects and returns the outputs of the vestibular system.

Override this function if you want to make some simple post-processing!

Returns

A numpy array with the vestibular data.

Return type

numpy.ndarray

get_vision_obs()

Collects and returns the outputs of the vision system.

Override this function if you want to make some simple post-processing!

Returns

A dictionary with one entry for each separate image. In the default implementation each eye renders one image, so each eye gets one entry.

Return type

dict[str, np.ndarray]

is_failure(achieved_goal, desired_goal)

Dummy function that always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False.

Return type

bool

is_success(achieved_goal, desired_goal)

Dummy function that always returns False.

Parameters
  • achieved_goal (object) – This parameter is ignored.

  • desired_goal (object) – This parameter is ignored.

Returns

False.

Return type

bool

is_truncated()

Dummy function. Always returns False.

Returns

False.

Return type

bool

metadata: dict[str, Any] = {'render_modes': []}
property n_actuators

The number of actuators for MIMo.

Returns

The number of actuators for MIMo.

Return type

int

property np_random: numpy.random._generator.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns

Instances of np.random.Generator

proprio_setup(proprio_params)

Perform the setup and initialization of the proprioceptive system.

This should be overridden if you want to use another implementation!

Parameters

proprio_params (dict) – The parameter dictionary.

render()

Render a frame from the MuJoCo simulation as specified by the render_mode.

render_mode: str | None = None
reset(*, seed: Optional[int] = None, options: Optional[dict] = None)

Resets the environment to an initial internal state, returning an initial observation and info.

This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the seed parameter otherwise if the environment already has a random number generator and reset() is called with seed=None, the RNG is not reset.

Therefore, reset() should (in the typical use case) be called with a seed right after initialization and then never again.

For Custom environments, the first line of reset() should be super().reset(seed=seed) which implements the seeding correctly.

Changed in version v0.25: The return_info parameter was removed and now info is expected to be returned.

Parameters
  • seed (optional int) – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and seed=None (the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG and seed=None is passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.

  • options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)

Returns

Observation of the initial state. This will be an element of observation_space

(typically a numpy array) and is analogous to the observation returned by step().

info (dictionary): This dictionary contains auxiliary information complementing observation. It should be analogous to

the info returned by step().

Return type

observation (ObsType)

reset_model()

Resets to the initial simulation state

reward_range = (-inf, inf)
sample_goal()

A dummy function returning an empty array of shape (0,).

Returns

An empty size 0 array.

Return type

numpy.ndarray

set_state(qpos, qvel)

Set the joints position qpos and velocity qvel of the model. Override this method depending on the MuJoCo bindings used.

spec: EnvSpec | None = None
state_vector()

Return the position and velocity joint states of the model

step(action)

Run one timestep of the environment’s dynamics.

This function takes a simulation step with the given control inputs, collects the observations, computes the reward and finally determines if we are done with this episode or not. _get_obs() collects the observations, compute_reward() calculates the reward.`:meth:._is_done is called to determine if we have reached a terminal state and _step_callback() can be used for extra functions each step, such as incrementing a step counter. Both the ‘terminated’ and ‘truncated’ return values are determined by :meth:._is_done`.

Parameters

action (np.ndarray) – An action provided by the agent

Returns

this will be an element of the environment’s observation_space.

This may, for instance, be a numpy array containing the positions and velocities of certain objects.

reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (success or failure as defined under the MDP of the task) is

reached. In this case further step() calls could return undefined results.

truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.

Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.

info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and

logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.

Return type

observation (object)

swap_facial_expression(emotion)

Changes MIMos facial texture.

Valid emotion names are in facial_expression, which links readable emotion names to their associated texture ids.

Parameters

emotion (str) – A valid emotion name.

touch_setup(touch_params)

Perform the setup and initialization of the touch system.

Uses the more complicated Trimesh implementation. Also plots the sensor points if show_sensors is True.

Parameters

touch_params (dict) – The parameter dictionary.

property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]

Returns the base non-wrapped environment.

Returns

The base non-wrapped gymnasium.Env instance

Return type

Env

vestibular_setup(vestibular_params)

Perform the setup and initialization of the vestibular system.

This should be overridden if you want to use another implementation!

Parameters

vestibular_params (dict) – The parameter dictionary.

vision_setup(vision_params)

Perform the setup and initialization of the vision system.

This should be overridden if you want to use another implementation!

Parameters

vision_params (dict) – The parameter dictionary.

action_space: spaces.Space[ActType]
observation_space: spaces.Space[ObsType]

Script

This module contains some functions to benchmark the performance of the simulation.

Classes and functions FunctionProfile, StatsProfile and get_stats_profile() from NAME HERE @ LINK

mimoEnv.benchmark.run(env, max_steps)

Runs an environment for a number of steps while taking random actions.

Parameters
  • env (gym.Env) – The environment.

  • max_steps (int) – The number of time steps that we run the environment for.

mimoEnv.benchmark.benchmark(configurations, output_file)

Benchmarks multiple configurations for MIMo.

We use cProfile as the profiler. Multiple runs with different configurations are performed and the runtime measurements saved to the specified output file. The profile for each run is also saved to a file named after the configuration. Configurations consist of an environment name and initialization parameters for that environment. Each configuration is run for the specified simulation time and the real time required for that is measured. MIMo takes random actions throughout. Measurements include the total runtime, the simulation time, the number of simulation steps, the time spent in environment initialization, the time spent on the physics simulation, and the time spent in each of the sensor modalities: touch, vision, proprioception and vestibular.

Parameters
  • configurations (List[Tuple[str, str, Dict, int]]) – A list of tuples storing configurations to be benchmarked. Each tuple has four entries: An arbitrary name for the entry, the name of the gym environment that will be run, a dictionary with parameters for the environment, and finally the duration of the run in simulation seconds. Note that the parameter dictionary can be empty if you wish to use the default parameters for the environment.

  • output_file (str) – Runtime results are written to this file.

mimoEnv.benchmark.load_benchmark_file(file_name) Dict

Loads a benchmark file in the format as produced by benchmark() into a dictionary.

Parameters

file_name (str) – The input benchmark file.

Returns

The dictionary with loaded benchmark data.

Return type

Dict[str, float]

mimoEnv.benchmark.make_stacked_bar_chart(data, labels: List[str], colors: Dict[str, str], ylabel, figsize=(6, 5), legend_loc='upper left')

Makes a stacked bar chart.

Parameters
  • data – A dictionary of dictionaries with data for each stacked bar. High level dictionary stores the label for each bar as keys and the associated data dictionary as values. Low level dictionary contains the data for each stack with component labels as keys.

  • labels – A list with the labels for each stack component. This also selects which components are plotted at all. The colors parameter must have an entry for every label.

  • colors – A dictionary with colors for each stack component.

  • ylabel – The y-axis label.

  • figsize – A tuple with the figure size.

  • legend_loc – Location of the legend.

Returns

A tuple (fig, ax) with the plotted chart

mimoEnv.benchmark.plot_benchmarks(file_name, list_of_runs, output_file, label_list=None, color_dict=None, figsize=(6, 5))

Create benchmark plots.

Loads data from a benchmark file and creates a stacked bar chart from the loaded data. Which runs are plotted can be selected with list_of_runs

Parameters
  • file_name (str) – The file containing the benchmark data.

  • list_of_runs (List[str]) – A list with the configuration names that will be plotted side by side.

  • output_file (str) – Output image file.

  • label_list (List[str]) – A list of the runtime components that will be plotted.

  • color_dict (Dict[str, str]) – A dictionary with the colors for each component listed above.

mimoEnv.benchmark.run_paper_benchmarks()

Performs the same benchmarks as used in the paper.

mimoEnv.benchmark.make_paper_plot(file_name, output_file)

Creates the sensor benchmarking plot from the paper.

Parameters
  • file_name (str) – Input benchmark file.

  • output_file (str) – Output image file.

Demo showroom

This scenario uses the same dummy class as the benchmarking script, but replaces the basic scene XML with a more elaborate one consisting of a square room with a number of toys. In this scenario MIMo takes no actions at all.

Script

Simple script to view the showroom. We perform no training and MIMo takes no actions.

mimoEnv.showroom.main()

Creates the environment and takes 200 time steps. MIMo takes no actions. The environment is rendered to an interactive window.