Differentiable GPU-Parallelized
Task and Motion Planning

Abstract

Planning long-horizon robot manipulation requires making discrete decisions about which objects to interact with and continuous decisions about how to interact with them. A robot planner must select grasps, placements, and motions that are feasible and safe. This class of problems falls under Task and Motion Planning (TAMP) and poses significant computational challenges in terms of algorithm runtime and solution quality, particularly when the solution space is highly constrained. To address these challenges, we propose a new bilevel TAMP algorithm that leverages GPU parallelism to efficiently explore thousands of candidate continuous solutions simultaneously. Our approach uses GPU parallelism to sample an initial batch of solution seeds for a plan skeleton and to apply differentiable optimization on this batch to satisfy plan constraints and minimize solution cost. We demonstrate that our algorithm can effectively solve highly constrained problems with non-convex constraints in just seconds, substantially outperforming serial TAMP approaches, and validate our approach on multiple real-world robots.

Method

cuTAMP Overview. cuTAMP frames TAMP as a backtracking bilevel search over plan skeletons. Each skeleton $\pi$ induces a continuous Constraint Satisfaction Problem that defines the structure of a particle (parameters) and cost functions (constraints and plan costs). These particles are optimized in parallel by evaluating their costs with differentiable cost functions, allowing gradient-based optimizers to iteratively update them towards satisfying solutions.

Solving Highly-Constrained Problems

In the Tetris domain, the robot's objective is to pack 5 blocks with non-convex shapes somewhere in a tight goal region. This problem requires reasoning about spatial arrangements, as the shapes will only fit if they are arranged in particular configuration modes.

We optimize for placement poses, parametrized as continuous 4-DOF actions with positions $(x, y, z)$ and yaw angles $\theta$, along with their associated 7-DOF robot joint positions. Each particle is 90-dimensional (ten 7-DOF arm configurations and five 4-DOF placement poses). We sample a fixed set of grasps and use an off-the-shelf motion planner (cuRobo) to solve for the full trajectories.

Optimization Visualization

We visualize the placement poses corresponding to the best particle as optimization progress.

TAMP Plan Execution

We show the full execution of the TAMP plan for the best particle. cuTAMP optimizes for a solution in just seconds.

Optimizing Goal Costs

cuTAMP supports optimizing plan costs. We consider placing four objects into a large goal region and set a goal cost to minimize the distance between the objects in the final state. Our results demonstrate the clear benefit of combining sampling with differentiable optimization for substantially reducing the goal cost.

Minimizing Distance between Objects

We compare the final state after executing the best particle. (a) cuTAMP achieves significantly lower cost compared to (b) parallelized sampling which performs no optimization.

Minimizing $y$-position of objects

We minimize the $y$-position of three cubes on a real Kinova Gen3 robot.

Efficiently Searching over Plan Skeletons

To efficiently guide the discrete search over plan skeletons and avoid optimizing unsolvable ones, we exploit parallelized constraint sampling from particle initialization to derive a plan feasibility heuristic. cuTAMP prioritizes skeletons for refinement based on their estimated feasibility, focusing our computational effort on those most likely to admit solutions.

In Stick Button, the robot must press the red, green, and blue buttons using either its fingers or a stick as a tool. Due to the kinematic limitations of the Franka, the robot must use the stick to press the blue and green buttons, as they are out of direct reach. This results in a large number of plan skeletons that are infeasible or have extraneous actions. The best configuration of cuTAMP solves this problem in under 1.5 seconds.

Cross-Embodiment Generalization

The blue button is beyond the reach of the Franka, requiring it to use the stick as a tool. In contrast, the UR5 can directly reach the button.

Stack Red Block 🟥 on Blue Block 🟦

The Mustard Bottle and Canister are obstructing the placement. cuTAMP reasons to move these objects out of the way first, before stacking the blocks.

Pack the Fruit on the White Cube

The strawberry 🍓 is obstructed by Lego blocks on all four sides, requiring at least two blocks to be moved out of the way. This task requires more planning time as there are many more skeletons to be considered (there are 5698 skeletons in total).

Citation

@article{shen2024cutamp,
    title={Differentiable GPU-Parallelized Task and Motion Planning},
    author={Shen, William and Garrett, Caelan and Kumar, Nishanth and Goyal, Ankit and Hermans, Tucker and Kaelbling, Leslie Pack and Lozano-P{\'e}rez, Tom{\'a}s and Ramos, Fabio},
    journal={arXiv preprint arXiv:2411.11833},
    year={2024}
}

Copied to clipboard!

Website source on GitHub

Updated April 2025