This is a repo to track the latest autoregressive visual generation papers.
- Neural Discrete Representation Learning Paper, NeurIPS 2017
- Generating Diverse High-Fidelity Images with VQ-VAE-2 Paper, NeurIPS 2019
- Taming Transformers for High-Resolution Image Synthesis Paper, CVPR 2021
- Autoregressive Image Generation using Residual Quantization Paper, CVPR 2022
- * BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers (for understanding) Paper, Arxiv 2022
- Vector-quantized Image Modeling with Improved VQGAN Paper, ICLR 2022
- MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation Paper, NeurIPS 2022
- * PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers (for understanding) Paper, AAAI 2023
- * All in Tokens: Unifying Output Space of Visual Tasks via Soft Token (for understanding) Paper, CVPR 2023
- Regularized Vector Quantization for Tokenized Image Synthesis Paper, CVPR 2023
- Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization Paper, CVPR 2023
- Not all image regionsmatter: Masked vector quantization for autoregressive image generation Paper, CVPR 2023
- Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms Paper, NeurIPS 2023
- HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes Paper, TMLR 2024
- Finite Scalar Quantization: VQ-VAE Made Simple Paper, ICLR 2024
- Planting a seed of vision in large language model Paper, ICLR 2024
- Language model beats diffusion–tokenizer is key to visual generation Paper, ICLR 2024
- Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis Paper, CVPR 2024
- Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper, NeurIPS 2024
- An Image is Worth 32 Tokens for Reconstruction and Generation Paper, NeurIPS 2024
- Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% Paper, Arxiv 2024
- Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data Paper, Arxiv 2024
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper, Arxiv 2024
- OPEN-MAGVIT2: AN OPEN-SOURCE PROJECT TOWARD DEMOCRATIZING AUTO-REGRESSIVE VISUAL GENERATION Paper, Arxiv 2024
- MaskBit: Embedding-free Image Generation via Bit Tokens Paper, Arxiv 2024
- ImageFolder: Autoregressive Image Generation with Folded Tokens Paper, Arxiv 2024
- Conditional image generation with pixelcnn decoders Paper, NeurIPS 2016
- DiVAE : Photorealistic Images Synthesis with Denoising Diffusion Decoder Paper
- Vector Quantized Diffusion Model for Text-to-Image Synthesis Paper
- MaskGIT: Masked Generative Image Transformer Paper
- BEIT: BERT Pre-Training of Image Transformers Paper
- BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers Paper
- MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis Paper
- Sequential modeling enables scalable learning for large vision models Paper, Arxiv 2023
- 4m: Massively multimodal masked modeling Paper, NeurIPS 2023
- Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper, Arxiv 2024
- ControlVAR: Exploring Controllable Visual Autoregressive Modeling Paper, Arxiv 2024
- Autoregressive Image Generation without Vector Quantization Paper, Arxiv 2024
- MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis Paper, Arxiv 2024
- ANOLE: AnOpen,Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper, Arxiv 2024
- VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling Paper, Arxiv 24
- Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper, Arxiv 24
- Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper, Arxiv 2024
- Scalable Autoregressive Image Generation with Mamba Paper, Arxiv 2024
- SHOW-O: ONE SINGLE TRANSFORMER TO UNIFY MULTIMODAL UNDERSTANDING AND GENERATION Paper, Arxiv 2024
- DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper, Arxiv 2024