SATO: Strips as Tokens

Strips as Tokens: Artist Mesh Generation with Native UV Segmentation

Rui Xu^1,2*, Dafei Qin^1,2*, Kaichun Qiao^3,2, Qiujie Dong⁴, Huaijin Pi¹, Qixuan Zhang^3,2‡, Longwen Zhang^3,2,
Lan Xu^3†, Jingyi Yu³, Wenping Wang⁵, Taku Komura^1†

¹The University of Hong Kong, ²Deemos Technology Co., Ltd., ³ShanghaiTech University, ⁴Shandong University, ⁵Texas A&M University

(* Equal contribution. ‡ Project lead. † Corresponding authors.)

Conditionally Accepted by SIGGRAPH 2026

Abstract

Recent advancements in autoregressive transformers have demonstrated remarkable potential for generating artist-quality meshes. However, the token ordering strategies employed by existing methods typically fail to meet professional artist standards, where coordinate-based sorting yields inefficiently long sequences, and patch-based heuristics disrupt the continuous edge flow and structural regularity essential for high-quality modeling. To address these limitations, we propose Strips as Tokens (SATO), a novel framework with a token ordering strategy inspired by triangle strips. By constructing the sequence as a connected chain of faces that explicitly encodes UV boundaries, our method naturally preserves the organized edge flow and semantic layout characteristic of artist-created meshes. A key advantage of this formulation is its unified representation, enabling the same token sequence to be decoded into either a triangle or quadrilateral mesh. This flexibility facilitates joint training on both data types: large-scale triangle data provides fundamental structural priors, while high-quality quad data enhances the geometric regularity of the outputs. Extensive experiments demonstrate that SATO consistently outperforms prior methods in terms of geometric quality, structural coherence, and UV segmentation.

Method

The Pipeline of SATO. SATO uses a strip-based tokenizer to encode/decode both triangle and quad meshes as a unified discrete sequence. Conditioned on an input point cloud, a learnable point-cloud encoder cross-attends to the core Hourglass Transformer, which autoregressively generates token sequences that are decoded into triangle or quad meshes with native UV segmentation.

Given a set of 3D points as conditioning input, our goal is to generate an artist-style mesh with organized UV segmentation. Our core contribution includes a serialization scheme that embeds macro-structural semantic cues like UV island boundaries into the token stream, and a stride-aware decoding protocol that allows the same model to generate both triangle and quadrilateral meshes.

Mesh Gallery

Generation results of SATO across diverse shapes, demonstrating strong generative diversity in both mesh geometry and UV segmentation. From bottom to top, it shows triangular mesh generation, shape generation with UV segmentation, and quadrilateral mesh generation. SATO supports all three tasks within a single framework and achieves compelling results on each of them.

UV Segmentation Gallery

Gallery of UV unwrapping results using our generated UV segmentation. Artists can readily apply textures to the resulting UV layout: each component of the input shape is cleanly and consistently separated into well-defined islands, enabling targeted texture painting without inadvertently affecting other parts.