Autoregressive models can generate high-quality 3D meshes by sequentially producing vertices and faces, but their token-by-token decoding results in slow inference, limiting practical use in interactive and large-scale applications. We present FlashMesh, a fast and high-fidelity mesh generation framework that rethinks autoregressive decoding through a predict-correct-verify paradigm. The key insight is that mesh tokens exhibit strong structural and geometric correlations that enable confident multi-token speculation. FlashMesh leverages this by introducing a speculative decoding scheme tailored to the commonly used hourglass transformer architecture, enabling parallel prediction across face, point, and coordinate levels. Extensive experiments show that FlashMesh achieves up to a 2x speedup⚡ over standard autoregressive models while also improving generation fidelity👍. Our results demonstrate that structural priors in mesh data can be systematically harnessed to accelerate and enhance autoregressive generation.
input
face count: 9307
input
face count: 8440
input
face count: 8662
input
face count: 8029
@misc{shen2025flashmeshfasterbetterautoregressive,
title={FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation},
author={Tingrui Shen and Yiheng Zhang and Chen Tang and Chuan Ping and Zixing Zhao and Le Wan and Yuwang Wang and Ronggang Wang and Shengfeng He},
year={2025},
eprint={2511.15618},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.15618},
}