
[2509.14476] AToken: A Unified Tokenizer for Vision - arXiv.org
Sep 17, 2025 · We present AToken, the first unified visual tokenizer that achieves both high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets. Unlike existing …
GitHub - apple/ml-atoken
Oct 22, 2025 · AToken is a unified vision tokenizer that handles multiple modalities (images, videos, and 3D) for both understanding and reconstruction through a single framework. It provides both …
AToken: A Unified Tokenizer for Vision - Semantic Scholar
Sep 17, 2025 · The first unified visual tokenizer that achieves both high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets is presented, and a pure transformer …
AToken - A Unified Tokenizer for Vision
Sep 23, 2025 · ATOKEN's standard patchification is applied, and features are aggregated back into the voxel space. Pure Transformer Architecture ATOKEN employs a unified transformer architecture for …
AToken: Unified Visual Tokenizer
Sep 17, 2025 · AToken: A Unified Tokenizer for Vision Motivation and Problem Statement The fragmentation of visual tokenization across modalities and tasks has impeded the development of …
AToken: A Unified Tokenizer for Vision - Apple Machine ...
Jul 11, 2025 · We present AToken, the first unified visual tokenizer that achieves both high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets. Unlike existing …
ATOKEN: A Unified Tokenizer for Vision (September 2025)
Date: September 2025 Summary: ATOKEN, a unified visual tokenizer, achieves high-fidelity reconstruction and semantic understanding across images, videos, and 3D assets. It encodes …