Stacking-our-way-to-more-general-robots

Building our path toward more versatile robots

author
5 minutes, 36 seconds Read

Introducing RGB-Stacking as a new robotic manipulation benchmark based on vision
For an individual, picking up a stick and balancing it on a log or piling a pebble on a stone may appear like easy tasks that are quite comparable. But most robots find it difficult to perform many tasks of this nature at once. A distinct set of behaviors is needed to manipulate a stick than to stack stones, much alone stack different dishes on top of each other or put together furniture. Robots must first learn how to interact with a significantly wider variety of items before we can educate them to carry out these types of activities. DeepMind is investigating ways to help robots comprehend the interactions of objects with different geometries, as part of its ambition to create more useful and generalizable robots.

As a new benchmark for vision-based robotic manipulation, we introduce RGB-Stacking in a work that will be presented at CoRL 2021 (Conference on Robot Learning) and is already accessible as a preprint on OpenReview. A robot must learn how to grip various things and balance them on top of one another in order to pass this benchmark. Our research differs from previous studies in that a wide range of objects were employed, and a significant number of empirical assessments were conducted to confirm our findings. Our findings provide a solid foundation for the unsolved issue of generalizing to new objects and show that sophisticated multi-object manipulation may be learned using a combination of simulation and real-world data. We’re making our simulated environment open-source to assist other studies, and we’re also making the blueprints for constructing our real-robot RGB-stacking environment, RGB-object models, and 3D printing instructions available. Additionally, a number of libraries and tools that we utilize in our broader robotics research are being made publicly available.

Our objective with RGB-Stacking is to use reinforcement learning to teach a robotic arm how to stack things of varying forms. The term RGB refers to the arrangement of three objects—one each of red, green, and blue—in a basket above a parallel gripper mounted on a robot arm. The objective is straightforward: in 20 seconds, stack the red object atop the blue object while using the green object as a distraction and barrier. Through training on various object sets, the learning process guarantees that the agent gains generalized skills. The characteristics that specify how the agent can grab and stack each item are purposely varied. These are known as grip and stack affordances. This concept of design compels the agent to display actions that beyond a mere pick-and-place tactic.

Every triplet presents different difficulties for the agent: Triplets 1 through 5 have different requirements. For Triplet 1, the top object must be precisely grasped; for Triplet 2, the top object must frequently be used as a tool to flip the bottom object before stacking; for Triplet 3, balancing is necessary; for Triplet 4, precise stacking (i.e., the object centroids must align); and for Triplet 5, the top object can easily roll off if not stacked gently. After evaluating the difficulties of this job, we discovered that our hand-coded programmed baseline could stack objects with a 51% success rate.


We have included two job variations with varying degrees of difficulty in our RGB-Stacking benchmark. Our objective in “Skill Mastery” is to teach one agent how to stack a predetermined set of five triplets with proficiency. The same triplets are used for evaluation in “Skill Generalization,” but the agent is trained on a vast array of training objects, with over a million potential triplets. These training items do not include the family of objects from which the test triplets were selected, in order to test for generalization. We split our learning pipeline into three steps in both versions:

  • First, we use a commercially available reinforcement learning algorithm—Maximum a Posteriori Policy Optimization (MPO)—to train in simulation. Currently, we leverage the state of the simulator, which enables quick training because the object locations are sent to the agent directly, negating the requirement for the agent to acquire the ability to locate objects in pictures. Since this information is unavailable in the real world, the generated policy cannot be applied to the actual robot.
  • Next, we use photos and the robot’s proprioceptive state to train a new policy in simulation, using only realistic observations. To enhance transfer to real-world visuals and dynamics, we employ a domain-randomised simulation. The new policy is a distillation of the state policy’s teachings, which act as a teacher by correcting the learning agent’s behaviors.
  • Finally, we use this policy to gather data on actual robots and use this data to train a better policy offline by valuing good transitions according to a learnt Q function, as in Critic Regularised Regression (CRR). This saves us time by enabling us to use the data that is passively gathered during the project rather than putting the actual robots through a laborious online training method.

Such decoupling of our learning process is important for two key reasons. First of all, it makes it possible for us to tackle the problem at all because starting over with the robots would take too much time. Additionally, it speeds up our research because various team members may focus on distinct pipeline segments before combining them for a total improvement.

Our agent stacks the five triplets in a new way. A vision-based agent had the best results with Skill Mastery, averaging 79% in simulation (Stage 2), 68% in zero-shot on actual robots (Stage 2), and 82% following the one-step policy improvement using real data (Stage 3). A final agent that underwent the same pipeline for skill generalization succeeded 54% of the time on actual robots (Stage 3). It is still a problem to close this gap between skill mastery and generalization.


While a lot of effort has been done in recent years to apply learning algorithms to addressing challenging real-robot manipulation issues at scale, much of this work has been focused on single item manipulation tasks like grabbing, pushing, or other similar tasks. When combined with our robotics resources, which are now accessible on GitHub, the RGB-Stacking method we outline in our research produces unexpected stacking methods and a mastery of stacking a subset of these items. However, this is really a first step toward greater possibilities, and the problem of generalization is still unresolved. We believe that this new benchmark, together with the environment, designs, and tools we have published, will lead to new ideas and methodologies that will make manipulation increasingly easier and robots more competent as academics continue to work toward solving the open problem of real generalization in robotics.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *