CADFS: A Big CAD Program Dataset and Framework for Computer-Aided Design with Large Language Models

CVPR 2026

Abstract

We introduce CADFS, a data-centric framework that enables large vision-language models to generate complex CAD design histories. Existing generative CAD systems are restricted to sketch-extrude operations due to simplified representations and limited datasets. We address this by introducing a FeatureScript-based representation and constructing a dataset of 450k real-world CAD models spanning 15 modeling operations. We obtain the dataset via a new pipeline that reconstructs clean, executable FeatureScript programs and provides multimodal annotations. Fine-tuning a VLM on this representation yields state-of-the-art results in text-conditioned CAD generation and image-based reconstruction, producing more accurate, diverse, and feature-rich designs than prior frameworks. Ablations show that each individual component of our framework, i.e., the FeatureScript representation, the extended operation set, and representation-aligned textual descriptions, significantly improves performance. Our framework substantially broadens the complexity and realism achievable in generative CAD.

Method

CADFS treats generative CAD as direct generation of FeatureScript, the native language of the Onshape platform. This choice is central to the method: instead of reducing models to simplified sketch-extrude tokens, FeatureScript preserves the full design history, including higher-level operations such as revolve, sweep, loft, fillet, chamfer, shell, boolean edits, and patterns. Because the representation remains executable and close to real engineering workflows, it provides a stronger target for learning than synthetic or heavily simplified CAD encodings.

Starting from Onshape's internal model representation, we reconstruct clean and compact FeatureScript programs by extracting the sequence of modeling operations, making implicit parameters explicit, standardizing units and numeric precision, replacing placeholder queries with meaningful references, normalizing random identifiers, simplifying operation definitions, and removing redundant construction steps. Each recovered program is executed and checked against the source model, so only designs whose code reproduces the original are kept in the dataset.

Overview of the CADFS method: FeatureScript reconstruction, language annotation, and LLM fine-tuning.

After reconstructing the programs, we add language supervision with a two-stage annotation pipeline. One LLM first writes a structured description of the construction process from the FeatureScript code, and a second LLM reviews that draft against the code and the documentation to correct terminology, verify the order of operations, and resolve ambiguous references to geometric entities. This produces descriptions that align closely with the actual modeling logic and give the training data an explicit link between natural language and executable CAD procedures.

The example below shows why FeatureScript is an effective target representation for this task. The code does not merely list primitive shapes: it records how sketches are created, how solids are produced from them, which edges or faces are referenced later, and how subsequent refinement and reuse operations modify the model.

FeatureScript example showing how a CAD model's design history is expressed as executable code.

In this example, the model is assembled through a sequence of interpretable operations: profiles are drawn with spline, arc, and text primitives; solids are created with revolve and extrude; specific edges are identified through structured queries; those entities are refined with fillets; parts are replicated with a circular pattern; and a loft operation builds a smooth support structure. Because FeatureScript can refer to geometry through its origin, role, type, and local topology, it supports precise downstream edits and makes the design history understandable both to humans and to language models.

Acknowledgments

We are grateful to Onshape for providing public access to a vast library of CAD designs.

BibTeX

@inproceedings{pyatov2026cadfs,
    title      = {{{CADFS}}: A Big {{CAD}} Program Dataset and Framework for Computer-Aided Design with Large Language Models},
    shorttitle = {{{CADFS}}},
    booktitle  = {2026 {{IEEE}}/{{CVF Conference}} on {{Computer Vision}} and {{Pattern Recognition}} ({{CVPR}})},
    author     = {Vladislav Pyatov and Gleb Bobrovskikh and Saveliy Galochkin and Nikita Boldyrev and Oleg Voynov and Alexander Filippov and Gonzalo Ferrer and Peter Wonka and Evgeny Burnaev},
    year       = 2026,
    month      = jun,
    langid     = {english}
}