IMR-LLM: Industrial Multi-Robot Task Planning
and Program Generation using Large Language Models

ICRA 2026
IMR-LLM teaser

IMR-LLM enables LLM-driven multi-robot coordination in industrial manufacturing. Given a complex manufacturing task, our system constructs a disjunctive graph encoding task dependencies, applies deterministic scheduling to produce a high-level task plan, and generates executable low-level robot programs via a process-tree-guided LLM pipeline.

Abstract

In modern industrial production, multiple robots often collaborate to complete complex manufacturing tasks. Large language models (LLMs), with their strong reasoning capabilities, have shown potential in coordinating robots for simple household and manipulation tasks. However, in industrial scenarios, stricter sequential constraints and more complex dependencies within tasks present new challenges for LLMs. To address this, we propose IMR-LLM, a novel LLM-driven Industrial Multi-Robot task planning and program generation framework. Specifically, we utilize LLMs to assist in constructing disjunctive graphs and employ deterministic solving methods to obtain a feasible and efficient high-level task plan. Based on this, we use a process tree to guide LLMs to generate executable low-level programs. Additionally, we create IMR-Bench, a challenging benchmark that encompasses multi-robot industrial tasks across three levels of complexity. Experimental results indicate that our method significantly surpasses existing methods across all evaluation metrics.

Key Contributions

IMR-LLM Framework

A multi-robot task planning and program generation framework in industrial production lines that integrates LLMs with heuristic algorithms to construct and solve disjunctive graphs, while leveraging a process tree to guide program generation.

IMR-Bench

A benchmark designed to evaluate the performance of multi-robot systems in industrial tasks, which includes manufacturing tasks of varying complexity and meticulously designed metrics.

Simulation & Real-World Evaluation

The framework is implemented and evaluated in both simulated and real-world settings, performing thorough testing across a wide array of tasks.

Method

We introduce IMR-LLM, an LLM-driven framework for automated task planning and program generation in industrial multi-robot systems.

IMR-LLM pipeline overview

An overview of our method. Given an instruction I, an industrial scene S, and program examples E, our method performs task planning to decompose operations, assign robots, and schedule operations using a disjunctive graph and a heuristic solver. This is followed by program generation that translates the plan into executable Python code under the guidance of an operation process tree.

1

Task Planning

Decompose operations, assign robots, and schedule operations using a disjunctive graph and a heuristic solver.

2

Program Generation

Translate the plan into executable Python code under the guidance of an operation process tree.

3

Execution

Deploy the high-level plan and low-level programs to enable collaborative execution by multiple robots in industrial settings.

Benchmark

We introduce IMR-Bench, built upon the SpeedBot KunWu platform, comprising scenes and tasks collected from real industrial environments by production line design experts.

23
Industrial Scenes
with 1–7 robots each
50
Manufacturing Tasks
reflecting real industrial needs
3 Difficulty Levels
Single Robot Task
  • 🤖 1 robot
  • ⚙️ Up to 5 operations
  • ▶️ Sequential execution
Simple Multi-Robot Task
  • 🤖 Up to 3 robots
  • ⚙️ Up to 10 operations
  • ▶️ Parallel or sequential
Complex Multi-Robot Task
  • 🤖 Up to 7 robots
  • ⚙️ Up to 24 operations
  • ▶️ Mixed parallel & sequential
IMR-Bench dataset overview

An overview of our dataset. Our tasks consist of (a) various scenes and (b) various machines and robots equipped with different end-effectors. (c) A pie chart showing the distribution of task types on the left and a bar chart showing the average number of operations per task type on the right.

Results

IMR-LLM achieves state-of-the-art performance across all metrics on IMR-Bench.

Methods Single Robot Task Simple Multi-Robot Task Complex Multi-Robot Task
OC↑SE↑Exe↑GCR↑SR↑ OC↑SE↑Exe↑GCR↑SR↑ OC↑SE↑Exe↑GCR↑SR↑
SMART-LLM 0.830.700.500.500.50 0.670.460.460.320.20 0.500.040.000.030.00
LaMMA-S 0.800.800.700.700.70 0.710.670.400.450.33 0.560.260.200.330.16
LaMMA-O 0.800.800.800.800.80 0.710.670.530.550.46 0.560.260.280.370.20
LiP-O 1.001.000.900.900.90 0.930.800.730.740.73 0.630.280.360.420.24
Ours (GPT-4o) 1.001.000.900.980.90 1.001.000.870.940.87 0.880.750.760.790.68
Ours (Qwen3-32B) 1.001.001.001.001.00 1.000.930.870.900.87 0.850.710.760.790.60

OC: Operation Correctness  |  SE: Scheduling Efficiency  |  Exe: Executability  |  GCR: Goal Completion Rate  |  SR: Success Rate

Demonstrations in Simulated Environment

Representative Application in Real World

Citation

@article{su2026imrllm,
  title     = {IMR-LLM: Industrial Multi-Robot Task Planning and
               Program Generation using Large Language Models},
  author    = {Su, Xiangyu and Xu, Juzhan and van Kaick, Oliver
               and Xu, Kai and Hu, Ruizhen},
  journal   = {arXiv preprint arXiv:2603.02669},
  year      = {2026}
}