Transferring Dexterous Manipulation from
GPU Simulation to a Remote Real-World Trifinger

1University of Toronto & Vector Institute   2NVIDIA   3ETH Zurich   4Snap   5MPI Tubingen


We present a system for learning a challenging dexterous manipulation task involving moving a cube to an arbitrary 6-DoF pose with only 3-fingers trained with NVIDIA's IsaacGym simulator. We show empirical benefits, both in simulation and sim-to-real transfer, of using keypoints as opposed to position+quaternion representations for the object pose in 6-DoF for policy observations and in reward calculation to train a model-free reinforcement learning agent. By utilizing domain randomization strategies along with the keypoint representation of the pose of the manipulated object, we achieve a high success rate of 83% on a remote Trifinger system maintained by the organizers of the Real Robot Challenge. With the aim of assisting further research in learning in-hand manipulation, we make the codebase of our system, along with trained checkpoints that come with billions of steps of experience available below.

Training takes place entirely in simulation, using desktop-scale compute.

We then transfer to the real robot, achieving an 83% success rate on 6 DoF (position + orientation) reposing.



Representative sample of real robot videos

(10 FPS, real-time)

Zero-shot transfer

... to other scales

0.6x scale

1.2x scale

... to other objects


Arbitrary cuboid

Sim2Real across the Pond

Our system trains on Desktop level compute in Canada, on 16,384 environments in parallel on a single NVIDIA Tesla V100 GPU. Inference is then conducted remotely on a TriFinger robot located across the Atlantic in Germany using the uploaded actor weights.

A picture of the robot farm used. The policy may be run of any one of a number of different Trifinger robots, meaning it needs to be robust to a range of parameters. The infrastructure on which we perform Sim2Real transfer is provided courtesy of the organisers of the Real Robot Challenge.


This work was led by University of Toronto in collaboration with Nvidia, Vector Institute, MPI, ETH and Snap. We would like to thank Vector Institute for computing support, as well as the CIFAR AI Chair for research support to Animesh Garg.

Website template from DiSECt paper.