neuralCAD-Edit Benchmark | Autodesk Research

Overview

neuralCAD-Edit is a benchmark of 3D CAD editing ability, designed to assess the ability of AI to follow editing requests provided by users. The dataset consists of 192 multimodal editing requests (including video, text, and drawings) and 384 edits collected from ten expert consenting CAD designers specifically for this benchmark. Input CAD models are sourced from the Fusion Gallery Dataset, and reflect a range of single-body and assembly models, with and without parametric design histories.

Experts CAD users requested edits to CAD models, in a number of different modality combinations. These edits were carried out by the original requestor and one other CAD expert.

Multimodal editing requests

Professional CAD engineers don't describe edits by typing in a text box. They interact with models, point at specific faces and edges, produce hand-drawn markup, and talk through the changes they want to see. neuralCAD-Edit is the first CAD benchmark that captures these natural ways of communicating. We recorded consenting expert designers making requests to edit 3D CAD models in Autodesk Fusion.

Request Modality	Editing request	Human Groundtruth Edit (Requestor)	Human Baseline Edit (Other Expert)
Interactive + Static Drawings Hard
Interactive + Temporary Drawings Easy
Interactive Hard
Text Medium

Example requests and edits. Requestors asked for easy, medium, and hard, edits that they expected to take 2, 5, and 10 minutes to complete. Screenshots and commands of edits were logged.

We found that including drawings in requests allowed requestors to communicate larger changes and resulted in higher quality edits.

Benchmarking frontier models

Each request was carried out by the original requestor and one additional CAD expert, providing both a ground-truth model for computing automatic metrics and a human baseline of CAD editing performance. We ran GPT 5.2, Gemini 3 Pro, and Claude Sonnet 4.5 on the full set of editing requests, allowing models to inspect and refine their outputs up to 10 times.

Claude Sonnet 4.5	Gemini 3 Pro	GPT-5.2
		—
—

—	—

Renders of model outputs. "Initial model" shows the starting state before editing. Human and AI results shown per row (request). Empty cells indicate no valid BREP file was generated for that model on that request.

Measuring editing performance

We evaluated model outputs with feature-based metrics, 3D volumetric metrics, VLM and human evaluations. Human evaluations revealed a striking gap between even the best AI model (GPT 5.2) and human baselines. While VLM evaluations and automatic metrics provided a rough sense of model performance, they did not correlate strongly with ratings from CAD experts — highlighting the necessity of human evaluations until better metrics are developed. We hope this benchmark gives the community a clear target to aim for as models improve.

Leaderboard

	Automatic metrics				Human eval
	Chamfer-dist ↓	Voxel-IoU ↑	DINO-sim ↑	Validity ↑	Instruction ↑	Quality ↑	Acceptance ↑
GT Human requestor	—	—	—	—	0.74	0.66	0.82
Human baseline	22	0.76	0.93	1.00	0.74	0.66	0.78
GPT 5.2	50	0.57	0.66	0.99	0.48	0.39	0.25
Gemini-3-Pro	110	0.30	0.36	0.58	0.27	0.16	0.10
Claude Sonnet 4.5	54	0.18	0.25	0.42	0.22	0.10	0.05

We provide code to compute the automatic metrics. If you would like to have your model added to the leaderboard, please send us your model outputs and we will gladly coordinate human evaluations.

We will be keeping this leaderboard up to date as models and harnesses/tooling improve.

Resources

Access the paper, code, and dataset for neuralCAD-Edit.

Publication

Paper

Full research paper describing the benchmark, dataset collection, and evaluation methodology.

Coming soon

Repository

Code

Code for accessing and viewing the data, model harnesses and scripts for running the benchmark.

Dataset

Data

Download the full benchmark dataset including requests, edits, and models.

Cite this work

Citation coming soon.

Contact

For questions about the benchmark or dataset, please reach out:

toby.perrett@autodesk.com william.mccarthy@autodesk.com