Sample environments
This section describes the code used for the experiments and demos from our paper MIMo: A Multi-Modal Infant Model for Studying Cognitive Development in Humans and AIs. The learning illustration environments, reach, standup, self-body and catch each involve an environment and a training script using RL algorithms from Stable Baselines3. The catch environment is based on the full hand version of MIMo, while the others use the mitten hand. There is a simple benchmarking scenario in which MIMo takes random actions. Finally there is a demo environment in a simple room with some toys, with all sensory modalities enabled using the default configurations.
All of the the environments register with gym under the names MIMoReach-v0,
MIMoStandup-v0, MIMoSelfBody-v0, MIMoCatch-v0, MIMoBench-v0 and MIMoShowroom-v0.
Contents
Reach Environment
This module contains a simple reaching experiment in which MIMo tries to touch a hovering ball.
The scene consists of MIMo and a hovering ball located within reach of MIMos right arm. The task is for MIMo to touch the ball. MIMo is fixed in position and can only move his right arm. His head automatically tracks the location of the ball, i.e. the visual search for the ball is assumed. Sensory input consists of the full proprioceptive inputs. All other modalities are disabled.
The ball hovers stationary. An episode is completed successfully if MIMo touches the ball, knocking it out of position. There are no failure states. The position of the ball is slightly randomized each trial.
Reward shaping is employed, with a negative reward based on the distance between MIMos hand and the ball. A large fixed reward is given when he touches the ball.
The class with the environment is MIMoReachEnv while the path to the scene XML is defined
in REACH_XML.
- mimoEnv.envs.reach.REACH_XML
Path to the reach scene.
- class mimoEnv.envs.reach.MIMoReachEnv(model_path=REACH_XML, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=None, vision_params=None, vestibular_params=None, actuation_model=SpringDamperModel, goals_in_observation=False, done_active=True, **kwargs)
Bases:
Generic[gymnasium.core.ObsType,gymnasium.core.ActType]MIMo reaches for an object.
Attributes and parameters are the same as in the base class, but the default arguments are adapted for the scenario.
Due to the goal condition we do not use the
goalattribute or the interfaces associated with it. Instead, the reward and success conditions are computed directly from the model state, whilesample_goal()andget_achieved_goal()are dummy functions.- compute_reward(achieved_goal, desired_goal, info)
Computes the reward.
A negative reward is given based on the distance between MIMos fingers and the ball. If contact is made a fixed positive reward of 100 is granted. The achieved and desired goal parameters are ignored.
- is_success(achieved_goal, desired_goal)
Determines the goal states.
- is_failure(achieved_goal, desired_goal)
Dummy function. Always returns False.
- sample_goal()
Dummy function. Returns an empty array.
- Returns
An empty array.
- Return type
numpy.ndarray
- get_achieved_goal()
Dummy function. Returns an empty array.
- Returns
An empty array.
- Return type
numpy.ndarray
- reset_model()
Resets the simulation.
We reset the simulation and then slightly move both MIMos arm and the ball randomly. The randomization is limited such that MIMo can always reach the ball.
- Returns
Observations after reset.
- Return type
Dict
- _step_callback()
Adjusts the head and eye positions to track the target.
Manually computes the joint positions required for the head and eyes to look at the target objects.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
Standup Environment
This module contains a simple reaching experiment in which MIMo tries to stand up.
The scene consists of MIMo and some railings representing a crib. MIMo starts sitting on the ground with his hands on the railings. The task is to stand up. MIMos feet and hands are welded to the ground and railings, respectively. He can move all joints in his arms, legs and torso. His head is fixed. Sensory input consists of proprioceptive and vestibular inputs, using the default configurations for both.
MIMo initial position is determined by slightly randomizing all joint positions from a standing position and then letting the simulation settle. This leads to MIMo sagging into a slightly random crouching or sitting position each episode. All episodes have a fixed length, there are no goal or failure states.
Reward shaping is employed, such that MIMo is penalised for using muscle inputs and large inputs in particular. Additionally, he is rewarded each step for the current height of his head.
The class with the environment is MIMoStandupEnv while the path to the scene XML is
defined in STANDUP_XML.
- mimoEnv.envs.standup.STANDUP_XML
Path to the stand up scene.
- mimoEnv.envs.standup.SITTING_POSITION
Initial position of MIMo. Specifies initial values for all joints. We grabbed these values by posing MIMo using the MuJoCo simulate executable and the positional actuator file. We need these not just for the initial position but also resetting the position each step.
- class mimoEnv.envs.standup.MIMoStandupEnv(model_path=STANDUP_XML, initial_qpos=SITTING_POSITION, frame_skip=2, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=None, vision_params=None, vestibular_params=DEFAULT_VESTIBULAR_PARAMS, actuation_model=SpringDamperModel)
Bases:
Generic[gymnasium.core.ObsType,gymnasium.core.ActType]MIMo stands up using crib railings as an aid.
Attributes and parameters are the same as in the base class, but the default arguments are adapted for the scenario. Specifically we have
done_activeandgoals_in_observationasFalseand touch and vision sensors disabled.Even though we define a success condition in
_is_success(), it is disabled sincedone_activeis set toFalse. The purpose of this is to enable extra information for the logging features of stable baselines.- init_crouch_position
The initial position.
- Type
numpy.ndarray
- compute_reward(achieved_goal, desired_goal, info)
Computes the reward.
The reward consists of the current height of MIMos head with a penalty of the square of the control signal. :param achieved_goal: The achieved head height. :type achieved_goal: float :param desired_goal: This parameter is ignored. :type desired_goal: float :param info: This parameter is ignored. :type info: dict
- Returns
The reward as described above.
- Return type
- is_success(achieved_goal, desired_goal)
Did we reach our goal height.
- reset_model()
Resets the simulation.
Return the simulation to the XML state, then slightly randomize all joint positions. Afterwards we let the simulation settle for a fixed number of steps. This leads to MIMo settling into a slightly random sitting or crouching position.
- Returns
Observations after reset.
- Return type
Dict
- is_failure(achieved_goal, desired_goal)
Dummy function. Always returns
False.
- sample_goal()
Returns the goal height.
We use a fixed goal height of 0.5.
- Returns
0.5
- Return type
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
Self-body Environment
This module contains a simple experiment where MIMo is tasked with touching parts of his own body.
The scene is empty except for MIMo, who is sitting on the ground. The task is for MIMo to touch a randomized target body part with his right arm. MIMo is fixed in the initial sitting position and can only move his right arm. Sensory inputs consist of touch and proprioception. Proprioception uses the default settings, but touch excludes several body parts and uses a lowered resolution to improve runtime. The body part can be any of the geoms constituting MIMo.
MIMos initial position is constant in all episodes. The target body part is randomized. An episode is completed successfully if MIMo touches the target body part with his right arm.
The reward structure consists of a large fixed reward for touching the right body part, a shaping reward for touching another body part, depending on the distance between the contact and the target body part, and a penalty for each time step.
The class with the environment is MIMoSelfBodyEnv while the path to the scene XML is
defined in SELFBODY_XML.
- mimoEnv.envs.selfbody.TOUCH_PARAMS
List of possible target bodies.
- mimoEnv.envs.selfbody.SITTING_POSITION
Initial position of MIMo. Specifies initial values for all joints. We grabbed these values by posing MIMo using the MuJoCo simulate executable and the positional actuator file. We need these not just for the initial position but also resetting the position (excluding the right arm) each step.
- mimoEnv.envs.selfbody.SELFBODY_XML
Path to the scene for this experiment.
- class mimoEnv.envs.selfbody.MIMoSelfBodyEnv(model_path=SELFBODY_XML, initial_qpos=SITTING_POSITION, frame_skip=1, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=TOUCH_PARAMS, vision_params=None, vestibular_params=None, actuation_model=SpringDamperModel, goals_in_observation=True, done_active=True, **kwargs)
Bases:
Generic[gymnasium.core.ObsType,gymnasium.core.ActType]MIMo learns about his own body.
MIMo is tasked with touching a given part of his body using his right arm. Attributes and parameters are mostly identical to the base class, but there are two changes. The constructor takes two arguments less, goals_in_observation and done_active, which are both permanently set to
True. Finally, there are two extra attributes for handling the goal state. Thegoalattribute stores the target geom in a one hot encoding, whiletarget_geomandtarget_bodystore the geom and its associated body as an index. For more information on geoms and bodies please see the MuJoCo documentation.- init_sitting_qpos
The initial position.
- Type
numpy.ndarray
- sample_goal()
Samples a new goal and returns it.
The goal consists of a target geom that we try to touch, returned as a one-hot encoding. We also populate
target_geomandtarget_body. which are used by other functions.- Returns
The target geom in a one hot encoding.
- Return type
numpy.ndarray
- is_success(achieved_goal, desired_goal)
We have succeeded when we have a touch sensation on the goal body.
We ignore the
goalattribute in this for performance reasons and determine the success condition usingtarget_geominstead. This allows us to save a number of array operations each step.
- compute_reward(achieved_goal, desired_goal, info)
Computes the reward each step.
Three different rewards can be returned:
If we touched the target geom, the reward is 500.
If we touched a geom, but not the target, the reward is the negative of the distance between the touch contact and the target body.
Otherwise the reward is -1.
- reset_model()
Reset to the initial sitting position.
- Returns
Observations after reset.
- Return type
Dict
- is_failure(achieved_goal, desired_goal)
Dummy function that always returns
False.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
- get_achieved_goal()
Dummy function that returns an empty array.
- Returns
An empty array.
- Return type
numpy.ndarray
Self-body Environment
This module contains a simple reaching experiment in which MIMo tries to catch a falling ball.
The scene consists of MIMo with his right arm outstretched and his palm open. A ball is located just above MIMos palm. The task is for him to catch the falling ball. MIMo is fixed in position and can only move his right hand. Sensory input consists of the full proprioceptive inputs and touch input.
An episode is completed successfully if MIMo holds onto the ball continuously for 1 second. An episode fails when the ball drops some distance below MIMos hand or is bounced into the distance.
There is a small negative reward for each step without touching the ball, a larger positive reward for each step in contact with the ball and then a large fixed reward on success.
- mimoEnv.envs.catch.CATCH_XML
Path to the reach scene.
- mimoEnv.envs.catch.TOUCH_PARAMS
Touch parameters for the catch environment. Only the right arm is equipped with sensors.
- mimoEnv.envs.catch.CATCH_CAMERA_CONFIG
Camera configuration so it looks straight at the hand.
- class mimoEnv.envs.catch.MIMoCatchEnv(model_path=CATCH_XML, initial_qpos=None, frame_skip=2, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=TOUCH_PARAMS, vision_params=None, vestibular_params=None, actuation_model=MuscleModel, goals_in_observation=False, done_active=True, action_penalty=True, jitter=False, position_inaccurate=False, default_camera_config=CATCH_CAMERA_CONFIG, **kwargs)
Bases:
Generic[gymnasium.core.ObsType,gymnasium.core.ActType]MIMo tries to catch a falling ball.
MIMo is tasked with catching a falling ball and holding onto it for one second. MIMo’s head and eyes automatically track the ball. The position of the ball is slightly randomized each episode. The constructor takes three additional arguments over the base environment.
- Parameters
action_penalty (bool) – If
True, an action penalty based on the cost function of the actuation model is applied to the reward. DefaultTrue.jitter (bool) – If
True, the input actions are multiplied with a perturbation array which is randomized every 10-50 time steps. DefaultFalse.position_inaccurate (bool) – If
True, the position tracked by the head is offset by a small random distance from the true position of the ball. DefaultFalse.
- action_penalty
If
True, an action penalty based on the cost function of the actuation model is applied to the reward. DefaultTrue.- Type
- jitter
If
True, the input actions are multiplied with a perturbation array which is randomized every 10-50 time steps. DefaultFalse.- Type
- use_position_inaccuracy
If
True, the position tracked by the head is offset by a small random distance from the true position of the ball. DefaultFalse.- Type
- position_limits
Maximum distances away from the default ball position for the randomization.
- Type
np.ndarray
- position_inaccuracy_limits
Maximum distances for the head tracking offset.
- Type
np.ndarray
- position_offset
The actual inaccuracy of the head tracking. This is randomized each episode.
- Type
np.ndarray
- jitter_array
Control inputs are multiplied by this array before being passed to MuJoCo. This is randomized every so often.
- Type
np.ndarray
- jitter_period
The number of steps the current jitter array is used for before being randomized again.
- Type
- in_contact_past
A list storing which past steps we were in contact for. This list works by modulo, i.e. to determine if MIMo held the ball on step i, do
in_contact_past[i % steps_in_contact_for_success].- Type
List[bool]
- compute_reward(achieved_goal, desired_goal, info)
Computes the reward.
MIMo is rewarded for each time step in contact with the target. Completing an episode successfully awards +100, while failing leads to a -100 penalty. Additionally, there is an action penalty based on the cost function of the actuation model.
- do_simulation(action, n_frames)
Implementation that adds jitter to the actions.
- _get_obs()
Adds the size of the ball to the observations.
- Returns
The altered observation dictionary.
- Return type
Dict
- is_success(achieved_goal, desired_goal)
Returns true if MIMo touches the object continuously for 1 second.
- is_failure(achieved_goal, desired_goal)
Returns
Trueif the ball drops below MIMo’s hand.
- sample_goal()
Dummy function. Returns an empty array.
- Returns
An empty array.
- Return type
numpy.ndarray
- get_achieved_goal()
Dummy function. Returns an empty array.
- Returns
An empty array.
- Return type
numpy.ndarray
- reset_model()
Resets the simulation.
We reset the simulation and then slightly move both MIMos arm and the ball randomly. The randomization is limited such that MIMo can always reach the ball.
- Returns
Always returns
True.- Return type
- _step_callback()
Checks if MIMo is touching the ball and performs head tracking.
- _in_contact()
Check if MIMo is currently touching the target ball.
This function performs the actual contact check and is called during
step_callback().- Returns
Trueif MIMo is currently touching the ball,Falseotherwise..- Return type
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
- body_contact_reward()
Reward function that provides higher rewards the more geoms are touching the target.
- Returns
The reward component as described above.
- Return type
- _currently_in_contact()
Check if MIMo is currently touching the ball.
Unlike
_in_contact()this function does not perform the check itself, instead checking the array of past contacts for the current time step. The output of this function will not be accurate if called before_in_contact()!- Returns
Trueif MIMo is currently touching the ball,Falseotherwise.- Return type
Training script
There is also a training script for all the sample environments.
Training script for the demonstration experiments.
This script allows simple training and testing of RL algorithms in the demo environments with a command line interface. A selection of RL algorithms from the Stable Baselines3 library can be selected. Interactive rendering is disabled during training to speed up computation, but enabled during testing, so the behaviour of the model can be observed directly.
Trained models are saved into the “models/<scenario>” directory, i.e. if you train a reach model and name it “my_model”, it will be saved under “models/reach/my_model”.
To train a given algorithm for some number of time steps:
python illustrations.py --env=reach --train_for=200000 --test_for=1000 --algorithm=PPO --save_model=<model_suffix>
To review a trained model:
python illustrations.py --env=reach --test_for=1000 --load_model=<your_model_suffix>
The available algorithms are PPO, SAC, TD3, DDPG, A2C.
- mimoEnv.illustrations.test(env, save_dir, test_for=1000, model=None, render_video=False)
Testing function to view the behaviour of a model.
- Parameters
env (MIMoEnv) – The environment on which the model should be tested. This does not have to be the same training environment, but action and observation spaces must match.
save_dir (str) – The directory in which any rendered videos will be saved.
test_for (int) – The number of timesteps the testing runs in total. This will be broken into multiple episodes if necessary.
model – The stable baselines model object. If
Nonewe take random actions instead. DefaultNone.render_video (bool) – If
True, all episodes during testing will be recorded and saved as videos in save_dir.
- mimoEnv.illustrations.main()
CLI for the demonstration environments.
Command line interface that can train and load models for the standup scenario. Possible parameters are:
--env: The demonstration environment to use. Must be one ofreach, standup, selfbody, catch.--train_for: The number of time steps to train. No training takes place if this is 0. Default 0.--test_for: The number of time steps to test. Testing renders the environment to an interactive window, so the trained behaviour can be observed. Default 1000.--save_every: The number of time steps between model saves. This can be larger than the total training time, in which case we save once when training completes. Default 100000.--algorithm: The algorithm to train. This argument must be provided if you train. Must be one ofPPO, SAC, TD3, DDPG, A2C, HER.--load_model: The path to the model to load.--save_model: The directory name where the trained model will be saved. An input of “my_model”, will lead tothe model being saved under “models/<env>/my_model”.
--use_muscles: This flag switches between actuation models. By default, the spring-damper model is used. Ifthis flag is set, the muscle model is used instead.
--render_video: If this flag is set, each testing episode is recorded and saved as a video in the samedirectory as the models.
Benchmarking
This script and the demo script use the same dummy class, but with different scene XMLs. For benchmarking the scene consisted of MIMo with all sensory modalities enabled with varying configurations and a couple of objects lying on the ground. In the benchmarking script we take random actions after each step.
Environments
This module defines a dummy implementation for MIMo, to allow easy testing of modules.
The main class is MIMoDummyEnv which implements all methods from the base class as dummy
functions that returned fixed values. This allows for testing the model without the full gym bureaucracy.
The second class MIMoShowroomEnv is identical to the first, but changes the default
parameters to load the showroom scene instead.
Finally, there is a demo class for the v2 version of MIMo using five-fingered hands and feet with two toes each in
MIMoV2DummyEnv.
- mimoEnv.envs.dummy.DEMO_XML
Path to the demo scene.
- mimoEnv.envs.dummy.BENCHMARK_XML
Path to the benchmarking scene.
- mimoEnv.envs.dummy.BENCHMARK_XML_V2
Path to the benchmarking scene using MIMo v2.
- mimoEnv.envs.dummy.TEST_XML
Path to the benchmarking scene using MIMo v2.
- class mimoEnv.envs.dummy.MIMoDummyEnv(model_path=BENCHMARK_XML, frame_skip=2, initial_qpos=None, render_mode=None, proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS, touch_params=DEFAULT_TOUCH_PARAMS, vision_params=DEFAULT_VISION_PARAMS, vestibular_params=DEFAULT_VESTIBULAR_PARAMS, actuation_model=SpringDamperModel, goals_in_observation=False, done_active=True, show_sensors=False, print_space_sizes=False, **kwargs)
Bases:
Generic[gymnasium.core.ObsType,gymnasium.core.ActType]Dummy implementation for
MIMoEnv.This class is meant for testing and demonstrating parts of the base class. All abstract methods are implemented as dummy functions that return fixed values. No meaningful goal or reward is specified. The default parameters use the default sensor configurations in a bare scene consisting of MIMo and two objects on an infinite plane. For testing and validation there are two additional parameters compared to the base class:
- Parameters
show_sensors – If
True, plot the sensor point distribution for the touch system during initialization. DefaultFalse.print_space_sizes – If
True, the shape of the action space and all entries in the observation dictionary is printed during initialization. DefaultFalse.
Finally, there are two extra attributes:
- show_sensors
If
True, plot the sensor point distribution for the touch system during initialization.- Type
- touch_setup(touch_params)
Perform the setup and initialization of the touch system.
Uses the more complicated Trimesh implementation. Also plots the sensor points if
show_sensorsisTrue.- Parameters
touch_params (dict) – The parameter dictionary.
- _obs_callback()
Simply increments the step counter.
- reset_model()
Resets to the initial simulation state
- is_success(achieved_goal, desired_goal)
Dummy function that always returns
False.
- is_failure(achieved_goal, desired_goal)
Dummy function that always returns
False.
- sample_goal()
A dummy function returning an empty array of shape (0,).
- Returns
An empty size 0 array.
- Return type
numpy.ndarray
- get_achieved_goal()
Dummy function returning an empty array with the same shape as the goal.
- Returns
An empty size 0 array.
- Return type
numpy.ndarray
- compute_reward(achieved_goal, desired_goal, info)
Dummy function that always returns a dummy value of 0.
- _env_setup()
This function initializes all the sensory components of the model.
Calls the setup functions for all the sensory components.
- _get_actuators()
Saves IDs of the actuators associated with MIMo in
mimo_actuators.
- _get_facial_expressions(emotion_textures)
Associates facial textures in the model with human-readable names for the associated emotions.
- _get_joints()
Saves the IDs of the joints associated with MIMO in
mimo_joints.
- _get_obs()
Returns the observation.
This function should return all simulation outputs relevant to whatever learning algorithm you wish to use. We always return proprioceptive information in the ‘observation’ entry, and this information always includes relative joint positions. Other sensory modalities get their own entries, if they are enabled. If
goals_in_observationis set toTrue, the achieved and desired goal are also included.- Returns
A dictionary containing simulation outputs with separate entries for each sensor modality.
- Return type
Dict
- _initialize_simulation()
Initialize MuJoCo simulation data structures mjModel and mjData.
- _is_done(achieved_goal, desired_goal, info)
This function should determine if we reached the end of an episode. Dummy implementation.
By default, this function always returns False. If
done_activeis set to True, instead returns True if eitheris_success()oris_failure()return True. The goal parameters are there to allow this class to be more easily overridden by subclasses, should this be required. They are ignored by default.- Parameters
- Returns
Whether the current episode reached a success or failure state. truncated (bool): Whether the current episode entered some kind of invalid condition or “finished” due to
some other constraint, such as a time limit.
- Return type
terminated (bool)
- _reset_simulation()
Resets MuJoCo and actuation simulation data and samples a new goal.
- _set_action(action)
Set the action for the next step.
Calls the actuation models function
mimoActuation.actuation.ActuationModel.action(). What exactly happens depends on the specific implementation.- Parameters
action (numpy.ndarray) – A numpy array with control values.
- _set_action_space()
Sets the action space attribute.
By default, the actuation space contains only MIMos actuators.
- _set_initial_position(initial_qpos)
Sets the initial positions for joints in the environment.
The input should be a dictionary with joint names as keys and joint positions (in radians as floats) as values. Thin function then sets each listed joint to the corresponding position. Joints not contained in the dictionary are left unaltered.
- _set_observation_space()
Sets the observation space attribute.
Calls
_get_obs()and determines the space using the returned observations.
- _single_mujoco_step()
- _step_callback()
A custom callback that is called after stepping the simulation, but before collecting observations.
Useful to enforce additional constraints on the simulation state before observations are collected. Note that the sensory modalities do not update until get_obs is called, so they will not have updated to the current timestep.
- _step_mujoco_simulation(ctrl, n_frames)
Step over the MuJoCo simulation.
- _substep_callback()
A custom callback that is called after each simulation substep.
- close()
Close all processes like rendering contexts
- do_simulation(action, n_frames)
Step simulation forward for n_frames number of steps.
- Parameters
action (np.ndarray) – The control input for the actuators.
n_frames (int) – The number of physics steps to perform.
- property dt
- get_body_com(body_name)
Return the cartesian position of a body frame
- get_proprio_obs()
Collects and returns the outputs of the proprioceptive system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array containing the proprioceptive output.
- Return type
numpy.ndarray
- get_touch_obs()
Collects and returns the outputs of the touch system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array containing the touch output.
- Return type
numpy.ndarray
- get_vestibular_obs()
Collects and returns the outputs of the vestibular system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array with the vestibular data.
- Return type
numpy.ndarray
- get_vision_obs()
Collects and returns the outputs of the vision system.
Override this function if you want to make some simple post-processing!
- property n_actuators
The number of actuators for MIMo.
- Returns
The number of actuators for MIMo.
- Return type
- property np_random: numpy.random._generator.Generator
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns
Instances of np.random.Generator
- proprio_setup(proprio_params)
Perform the setup and initialization of the proprioceptive system.
This should be overridden if you want to use another implementation!
- Parameters
proprio_params (dict) – The parameter dictionary.
- render()
Render a frame from the MuJoCo simulation as specified by the render_mode.
- reset(*, seed: Optional[int] = None, options: Optional[dict] = None)
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seedparameter otherwise if the environment already has a random number generator andreset()is called withseed=None, the RNG is not reset.Therefore,
reset()should (in the typical use case) be called with a seed right after initialization and then never again.For Custom environments, the first line of
reset()should besuper().reset(seed=seed)which implements the seeding correctly.Changed in version v0.25: The
return_infoparameter was removed and now info is expected to be returned.- Parameters
seed (optional int) – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=Noneis passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)
- Returns
- Observation of the initial state. This will be an element of
observation_space (typically a numpy array) and is analogous to the observation returned by
step().- info (dictionary): This dictionary contains auxiliary information complementing
observation. It should be analogous to the
inforeturned bystep().
- Observation of the initial state. This will be an element of
- Return type
observation (ObsType)
- reward_range = (-inf, inf)
- set_state(qpos, qvel)
Set the joints position qpos and velocity qvel of the model. Override this method depending on the MuJoCo bindings used.
- state_vector()
Return the position and velocity joint states of the model
- step(action)
Run one timestep of the environment’s dynamics.
This function takes a simulation step with the given control inputs, collects the observations, computes the reward and finally determines if we are done with this episode or not.
_get_obs()collects the observations,compute_reward()calculates the reward.`:meth:._is_done is called to determine if we have reached a terminal state and_step_callback()can be used for extra functions each step, such as incrementing a step counter. Both the ‘terminated’ and ‘truncated’ return values are determined by :meth:._is_done`.- Parameters
action (np.ndarray) – An action provided by the agent
- Returns
- this will be an element of the environment’s
observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.
reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (success or failure as defined under the MDP of the task) is
reached. In this case further step() calls could return undefined results.
- truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.
- info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and
logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.
- this will be an element of the environment’s
- Return type
observation (object)
- swap_facial_expression(emotion)
Changes MIMos facial texture.
Valid emotion names are in
facial_expression, which links readable emotion names to their associated texture ids.- Parameters
emotion (str) – A valid emotion name.
- property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]
Returns the base non-wrapped environment.
- Returns
The base non-wrapped
gymnasium.Envinstance- Return type
Env
- vestibular_setup(vestibular_params)
Perform the setup and initialization of the vestibular system.
This should be overridden if you want to use another implementation!
- Parameters
vestibular_params (dict) – The parameter dictionary.
- vision_setup(vision_params)
Perform the setup and initialization of the vision system.
This should be overridden if you want to use another implementation!
- Parameters
vision_params (dict) – The parameter dictionary.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
- class mimoEnv.envs.dummy.MIMoV2DummyEnv(model_path=BENCHMARK_XML_V2, touch_params=DEFAULT_TOUCH_PARAMS_V2, **kwargs)
Bases:
Generic[gymnasium.core.ObsType,gymnasium.core.ActType]Same as
MIMoDummyEnv, but using the full hand version of MIMo which has hands with five fingers and feet with two toes.- _env_setup()
This function initializes all the sensory components of the model.
Calls the setup functions for all the sensory components.
- _get_actuators()
Saves IDs of the actuators associated with MIMo in
mimo_actuators.
- _get_facial_expressions(emotion_textures)
Associates facial textures in the model with human-readable names for the associated emotions.
- _get_joints()
Saves the IDs of the joints associated with MIMO in
mimo_joints.
- _get_obs()
Returns the observation.
This function should return all simulation outputs relevant to whatever learning algorithm you wish to use. We always return proprioceptive information in the ‘observation’ entry, and this information always includes relative joint positions. Other sensory modalities get their own entries, if they are enabled. If
goals_in_observationis set toTrue, the achieved and desired goal are also included.- Returns
A dictionary containing simulation outputs with separate entries for each sensor modality.
- Return type
Dict
- _initialize_simulation()
Initialize MuJoCo simulation data structures mjModel and mjData.
- _is_done(achieved_goal, desired_goal, info)
This function should determine if we reached the end of an episode. Dummy implementation.
By default, this function always returns False. If
done_activeis set to True, instead returns True if eitheris_success()oris_failure()return True. The goal parameters are there to allow this class to be more easily overridden by subclasses, should this be required. They are ignored by default.- Parameters
- Returns
Whether the current episode reached a success or failure state. truncated (bool): Whether the current episode entered some kind of invalid condition or “finished” due to
some other constraint, such as a time limit.
- Return type
terminated (bool)
- _obs_callback()
Simply increments the step counter.
- _reset_simulation()
Resets MuJoCo and actuation simulation data and samples a new goal.
- _set_action(action)
Set the action for the next step.
Calls the actuation models function
mimoActuation.actuation.ActuationModel.action(). What exactly happens depends on the specific implementation.- Parameters
action (numpy.ndarray) – A numpy array with control values.
- _set_action_space()
Sets the action space attribute.
By default, the actuation space contains only MIMos actuators.
- _set_initial_position(initial_qpos)
Sets the initial positions for joints in the environment.
The input should be a dictionary with joint names as keys and joint positions (in radians as floats) as values. Thin function then sets each listed joint to the corresponding position. Joints not contained in the dictionary are left unaltered.
- _set_observation_space()
Sets the observation space attribute.
Calls
_get_obs()and determines the space using the returned observations.
- _single_mujoco_step()
- _step_callback()
A custom callback that is called after stepping the simulation, but before collecting observations.
Useful to enforce additional constraints on the simulation state before observations are collected. Note that the sensory modalities do not update until get_obs is called, so they will not have updated to the current timestep.
- _step_mujoco_simulation(ctrl, n_frames)
Step over the MuJoCo simulation.
- _substep_callback()
A custom callback that is called after each simulation substep.
- close()
Close all processes like rendering contexts
- compute_reward(achieved_goal, desired_goal, info)
Dummy function that always returns a dummy value of 0.
- do_simulation(action, n_frames)
Step simulation forward for n_frames number of steps.
- Parameters
action (np.ndarray) – The control input for the actuators.
n_frames (int) – The number of physics steps to perform.
- property dt
- get_achieved_goal()
Dummy function returning an empty array with the same shape as the goal.
- Returns
An empty size 0 array.
- Return type
numpy.ndarray
- get_body_com(body_name)
Return the cartesian position of a body frame
- get_proprio_obs()
Collects and returns the outputs of the proprioceptive system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array containing the proprioceptive output.
- Return type
numpy.ndarray
- get_touch_obs()
Collects and returns the outputs of the touch system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array containing the touch output.
- Return type
numpy.ndarray
- get_vestibular_obs()
Collects and returns the outputs of the vestibular system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array with the vestibular data.
- Return type
numpy.ndarray
- get_vision_obs()
Collects and returns the outputs of the vision system.
Override this function if you want to make some simple post-processing!
- is_failure(achieved_goal, desired_goal)
Dummy function that always returns
False.
- is_success(achieved_goal, desired_goal)
Dummy function that always returns
False.
- property n_actuators
The number of actuators for MIMo.
- Returns
The number of actuators for MIMo.
- Return type
- property np_random: numpy.random._generator.Generator
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns
Instances of np.random.Generator
- proprio_setup(proprio_params)
Perform the setup and initialization of the proprioceptive system.
This should be overridden if you want to use another implementation!
- Parameters
proprio_params (dict) – The parameter dictionary.
- render()
Render a frame from the MuJoCo simulation as specified by the render_mode.
- reset(*, seed: Optional[int] = None, options: Optional[dict] = None)
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seedparameter otherwise if the environment already has a random number generator andreset()is called withseed=None, the RNG is not reset.Therefore,
reset()should (in the typical use case) be called with a seed right after initialization and then never again.For Custom environments, the first line of
reset()should besuper().reset(seed=seed)which implements the seeding correctly.Changed in version v0.25: The
return_infoparameter was removed and now info is expected to be returned.- Parameters
seed (optional int) – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=Noneis passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)
- Returns
- Observation of the initial state. This will be an element of
observation_space (typically a numpy array) and is analogous to the observation returned by
step().- info (dictionary): This dictionary contains auxiliary information complementing
observation. It should be analogous to the
inforeturned bystep().
- Observation of the initial state. This will be an element of
- Return type
observation (ObsType)
- reset_model()
Resets to the initial simulation state
- reward_range = (-inf, inf)
- sample_goal()
A dummy function returning an empty array of shape (0,).
- Returns
An empty size 0 array.
- Return type
numpy.ndarray
- set_state(qpos, qvel)
Set the joints position qpos and velocity qvel of the model. Override this method depending on the MuJoCo bindings used.
- state_vector()
Return the position and velocity joint states of the model
- step(action)
Run one timestep of the environment’s dynamics.
This function takes a simulation step with the given control inputs, collects the observations, computes the reward and finally determines if we are done with this episode or not.
_get_obs()collects the observations,compute_reward()calculates the reward.`:meth:._is_done is called to determine if we have reached a terminal state and_step_callback()can be used for extra functions each step, such as incrementing a step counter. Both the ‘terminated’ and ‘truncated’ return values are determined by :meth:._is_done`.- Parameters
action (np.ndarray) – An action provided by the agent
- Returns
- this will be an element of the environment’s
observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.
reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (success or failure as defined under the MDP of the task) is
reached. In this case further step() calls could return undefined results.
- truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.
- info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and
logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.
- this will be an element of the environment’s
- Return type
observation (object)
- swap_facial_expression(emotion)
Changes MIMos facial texture.
Valid emotion names are in
facial_expression, which links readable emotion names to their associated texture ids.- Parameters
emotion (str) – A valid emotion name.
- touch_setup(touch_params)
Perform the setup and initialization of the touch system.
Uses the more complicated Trimesh implementation. Also plots the sensor points if
show_sensorsisTrue.- Parameters
touch_params (dict) – The parameter dictionary.
- property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]
Returns the base non-wrapped environment.
- Returns
The base non-wrapped
gymnasium.Envinstance- Return type
Env
- vestibular_setup(vestibular_params)
Perform the setup and initialization of the vestibular system.
This should be overridden if you want to use another implementation!
- Parameters
vestibular_params (dict) – The parameter dictionary.
- vision_setup(vision_params)
Perform the setup and initialization of the vision system.
This should be overridden if you want to use another implementation!
- Parameters
vision_params (dict) – The parameter dictionary.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
- class mimoEnv.envs.dummy.MIMoMuscleDummyEnv(model_path=BENCHMARK_XML_V2, touch_params=DEFAULT_TOUCH_PARAMS_V2, actuation_model=MuscleModel, **kwargs)
Bases:
Generic[gymnasium.core.ObsType,gymnasium.core.ActType]Same as
MIMoDummyEnv, but using the muscle actuation model. Uses the full hand version of MIMo by default.- _env_setup()
This function initializes all the sensory components of the model.
Calls the setup functions for all the sensory components.
- _get_actuators()
Saves IDs of the actuators associated with MIMo in
mimo_actuators.
- _get_facial_expressions(emotion_textures)
Associates facial textures in the model with human-readable names for the associated emotions.
- _get_joints()
Saves the IDs of the joints associated with MIMO in
mimo_joints.
- _get_obs()
Returns the observation.
This function should return all simulation outputs relevant to whatever learning algorithm you wish to use. We always return proprioceptive information in the ‘observation’ entry, and this information always includes relative joint positions. Other sensory modalities get their own entries, if they are enabled. If
goals_in_observationis set toTrue, the achieved and desired goal are also included.- Returns
A dictionary containing simulation outputs with separate entries for each sensor modality.
- Return type
Dict
- _initialize_simulation()
Initialize MuJoCo simulation data structures mjModel and mjData.
- _is_done(achieved_goal, desired_goal, info)
This function should determine if we reached the end of an episode. Dummy implementation.
By default, this function always returns False. If
done_activeis set to True, instead returns True if eitheris_success()oris_failure()return True. The goal parameters are there to allow this class to be more easily overridden by subclasses, should this be required. They are ignored by default.- Parameters
- Returns
Whether the current episode reached a success or failure state. truncated (bool): Whether the current episode entered some kind of invalid condition or “finished” due to
some other constraint, such as a time limit.
- Return type
terminated (bool)
- _obs_callback()
Simply increments the step counter.
- _reset_simulation()
Resets MuJoCo and actuation simulation data and samples a new goal.
- _set_action(action)
Set the action for the next step.
Calls the actuation models function
mimoActuation.actuation.ActuationModel.action(). What exactly happens depends on the specific implementation.- Parameters
action (numpy.ndarray) – A numpy array with control values.
- _set_action_space()
Sets the action space attribute.
By default, the actuation space contains only MIMos actuators.
- _set_initial_position(initial_qpos)
Sets the initial positions for joints in the environment.
The input should be a dictionary with joint names as keys and joint positions (in radians as floats) as values. Thin function then sets each listed joint to the corresponding position. Joints not contained in the dictionary are left unaltered.
- _set_observation_space()
Sets the observation space attribute.
Calls
_get_obs()and determines the space using the returned observations.
- _single_mujoco_step()
- _step_callback()
A custom callback that is called after stepping the simulation, but before collecting observations.
Useful to enforce additional constraints on the simulation state before observations are collected. Note that the sensory modalities do not update until get_obs is called, so they will not have updated to the current timestep.
- _step_mujoco_simulation(ctrl, n_frames)
Step over the MuJoCo simulation.
- _substep_callback()
A custom callback that is called after each simulation substep.
- close()
Close all processes like rendering contexts
- compute_reward(achieved_goal, desired_goal, info)
Dummy function that always returns a dummy value of 0.
- do_simulation(action, n_frames)
Step simulation forward for n_frames number of steps.
- Parameters
action (np.ndarray) – The control input for the actuators.
n_frames (int) – The number of physics steps to perform.
- property dt
- get_achieved_goal()
Dummy function returning an empty array with the same shape as the goal.
- Returns
An empty size 0 array.
- Return type
numpy.ndarray
- get_body_com(body_name)
Return the cartesian position of a body frame
- get_proprio_obs()
Collects and returns the outputs of the proprioceptive system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array containing the proprioceptive output.
- Return type
numpy.ndarray
- get_touch_obs()
Collects and returns the outputs of the touch system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array containing the touch output.
- Return type
numpy.ndarray
- get_vestibular_obs()
Collects and returns the outputs of the vestibular system.
Override this function if you want to make some simple post-processing!
- Returns
A numpy array with the vestibular data.
- Return type
numpy.ndarray
- get_vision_obs()
Collects and returns the outputs of the vision system.
Override this function if you want to make some simple post-processing!
- is_failure(achieved_goal, desired_goal)
Dummy function that always returns
False.
- is_success(achieved_goal, desired_goal)
Dummy function that always returns
False.
- property n_actuators
The number of actuators for MIMo.
- Returns
The number of actuators for MIMo.
- Return type
- property np_random: numpy.random._generator.Generator
Returns the environment’s internal
_np_randomthat if not set will initialise with a random seed.- Returns
Instances of np.random.Generator
- proprio_setup(proprio_params)
Perform the setup and initialization of the proprioceptive system.
This should be overridden if you want to use another implementation!
- Parameters
proprio_params (dict) – The parameter dictionary.
- render()
Render a frame from the MuJoCo simulation as specified by the render_mode.
- reset(*, seed: Optional[int] = None, options: Optional[dict] = None)
Resets the environment to an initial internal state, returning an initial observation and info.
This method generates a new starting state often with some randomness to ensure that the agent explores the state space and learns a generalised policy about the environment. This randomness can be controlled with the
seedparameter otherwise if the environment already has a random number generator andreset()is called withseed=None, the RNG is not reset.Therefore,
reset()should (in the typical use case) be called with a seed right after initialization and then never again.For Custom environments, the first line of
reset()should besuper().reset(seed=seed)which implements the seeding correctly.Changed in version v0.25: The
return_infoparameter was removed and now info is expected to be returned.- Parameters
seed (optional int) – The seed that is used to initialize the environment’s PRNG (np_random). If the environment does not already have a PRNG and
seed=None(the default option) is passed, a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). However, if the environment already has a PRNG andseed=Noneis passed, the PRNG will not be reset. If you pass an integer, the PRNG will be reset even if it already exists. Usually, you want to pass an integer right after the environment has been initialized and then never again. Please refer to the minimal example above to see this paradigm in action.options (optional dict) – Additional information to specify how the environment is reset (optional, depending on the specific environment)
- Returns
- Observation of the initial state. This will be an element of
observation_space (typically a numpy array) and is analogous to the observation returned by
step().- info (dictionary): This dictionary contains auxiliary information complementing
observation. It should be analogous to the
inforeturned bystep().
- Observation of the initial state. This will be an element of
- Return type
observation (ObsType)
- reset_model()
Resets to the initial simulation state
- reward_range = (-inf, inf)
- sample_goal()
A dummy function returning an empty array of shape (0,).
- Returns
An empty size 0 array.
- Return type
numpy.ndarray
- set_state(qpos, qvel)
Set the joints position qpos and velocity qvel of the model. Override this method depending on the MuJoCo bindings used.
- state_vector()
Return the position and velocity joint states of the model
- step(action)
Run one timestep of the environment’s dynamics.
This function takes a simulation step with the given control inputs, collects the observations, computes the reward and finally determines if we are done with this episode or not.
_get_obs()collects the observations,compute_reward()calculates the reward.`:meth:._is_done is called to determine if we have reached a terminal state and_step_callback()can be used for extra functions each step, such as incrementing a step counter. Both the ‘terminated’ and ‘truncated’ return values are determined by :meth:._is_done`.- Parameters
action (np.ndarray) – An action provided by the agent
- Returns
- this will be an element of the environment’s
observation_space. This may, for instance, be a numpy array containing the positions and velocities of certain objects.
reward (float): The amount of reward returned as a result of taking the action. terminated (bool): whether a terminal state (success or failure as defined under the MDP of the task) is
reached. In this case further step() calls could return undefined results.
- truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
Typically a timelimit, but could also be used to indicate agent physically going out of bounds. Can be used to end the episode prematurely before a terminal state is reached.
- info (dictionary): info contains auxiliary diagnostic information (helpful for debugging, learning, and
logging). This might, for instance, contain: metrics that describe the agent’s performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.
- this will be an element of the environment’s
- Return type
observation (object)
- swap_facial_expression(emotion)
Changes MIMos facial texture.
Valid emotion names are in
facial_expression, which links readable emotion names to their associated texture ids.- Parameters
emotion (str) – A valid emotion name.
- touch_setup(touch_params)
Perform the setup and initialization of the touch system.
Uses the more complicated Trimesh implementation. Also plots the sensor points if
show_sensorsisTrue.- Parameters
touch_params (dict) – The parameter dictionary.
- property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]
Returns the base non-wrapped environment.
- Returns
The base non-wrapped
gymnasium.Envinstance- Return type
Env
- vestibular_setup(vestibular_params)
Perform the setup and initialization of the vestibular system.
This should be overridden if you want to use another implementation!
- Parameters
vestibular_params (dict) – The parameter dictionary.
- vision_setup(vision_params)
Perform the setup and initialization of the vision system.
This should be overridden if you want to use another implementation!
- Parameters
vision_params (dict) – The parameter dictionary.
- action_space: spaces.Space[ActType]
- observation_space: spaces.Space[ObsType]
Script
This module contains some functions to benchmark the performance of the simulation.
Classes and functions FunctionProfile, StatsProfile and get_stats_profile() from
NAME HERE @ LINK
- mimoEnv.benchmark.run(env, max_steps)
Runs an environment for a number of steps while taking random actions.
- Parameters
env (gym.Env) – The environment.
max_steps (int) – The number of time steps that we run the environment for.
- mimoEnv.benchmark.benchmark(configurations, output_file)
Benchmarks multiple configurations for MIMo.
We use cProfile as the profiler. Multiple runs with different configurations are performed and the runtime measurements saved to the specified output file. The profile for each run is also saved to a file named after the configuration. Configurations consist of an environment name and initialization parameters for that environment. Each configuration is run for the specified simulation time and the real time required for that is measured. MIMo takes random actions throughout. Measurements include the total runtime, the simulation time, the number of simulation steps, the time spent in environment initialization, the time spent on the physics simulation, and the time spent in each of the sensor modalities: touch, vision, proprioception and vestibular.
- Parameters
configurations (List[Tuple[str, str, Dict, int]]) – A list of tuples storing configurations to be benchmarked. Each tuple has four entries: An arbitrary name for the entry, the name of the gym environment that will be run, a dictionary with parameters for the environment, and finally the duration of the run in simulation seconds. Note that the parameter dictionary can be empty if you wish to use the default parameters for the environment.
output_file (str) – Runtime results are written to this file.
- mimoEnv.benchmark.load_benchmark_file(file_name) Dict
Loads a benchmark file in the format as produced by
benchmark()into a dictionary.
- mimoEnv.benchmark.make_stacked_bar_chart(data, labels: List[str], colors: Dict[str, str], ylabel, figsize=(6, 5), legend_loc='upper left')
Makes a stacked bar chart.
- Parameters
data – A dictionary of dictionaries with data for each stacked bar. High level dictionary stores the label for each bar as keys and the associated data dictionary as values. Low level dictionary contains the data for each stack with component labels as keys.
labels – A list with the labels for each stack component. This also selects which components are plotted at all. The colors parameter must have an entry for every label.
colors – A dictionary with colors for each stack component.
ylabel – The y-axis label.
figsize – A tuple with the figure size.
legend_loc – Location of the legend.
- Returns
A tuple (fig, ax) with the plotted chart
- mimoEnv.benchmark.plot_benchmarks(file_name, list_of_runs, output_file, label_list=None, color_dict=None, figsize=(6, 5))
Create benchmark plots.
Loads data from a benchmark file and creates a stacked bar chart from the loaded data. Which runs are plotted can be selected with list_of_runs
- Parameters
file_name (str) – The file containing the benchmark data.
list_of_runs (List[str]) – A list with the configuration names that will be plotted side by side.
output_file (str) – Output image file.
label_list (List[str]) – A list of the runtime components that will be plotted.
color_dict (Dict[str, str]) – A dictionary with the colors for each component listed above.
- mimoEnv.benchmark.run_paper_benchmarks()
Performs the same benchmarks as used in the paper.
Demo showroom
This scenario uses the same dummy class as the benchmarking script, but replaces the basic scene XML with a more elaborate one consisting of a square room with a number of toys. In this scenario MIMo takes no actions at all.
Script
Simple script to view the showroom. We perform no training and MIMo takes no actions.
- mimoEnv.showroom.main()
Creates the environment and takes 200 time steps. MIMo takes no actions. The environment is rendered to an interactive window.