Skip to content

feice-huang/ConvRot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers

This repository contains the official implementation of ConvRot, proposed in ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers.

ConvRot is a group-wise rotation-based quantization method designed for Diffusion Transformers, enabling W4A4 inference without retraining while preserving visual quality.


🔍 Overview

We propose ConvRot, a rotation-based quantization approach that:

  • Leverages Regular Hadamard Transform (RHT) to suppress both row-wise and column-wise outliers
  • Reduces rotation complexity from quadratic to linear
  • Is plug-and-play, requiring no retraining or calibration
  • Preserves high-fidelity visual generation under 4-bit quantization

Building on ConvRot, we further design ConvLinear4bit, a unified module that integrates:

  • Rotation
  • Quantization
  • GEMM
  • Dequantization

into a single layer, enabling efficient W4A4 inference for Diffusion Transformers.


This codebase is built on top of QuaRot.
ConvRot-related code is located in QuaRot/convrot and QuaRot/e2e (coming soon).


🚀 Quick Start

The following steps mirror the usage in QuaRot.

Installation

cd QuaRot
pip install -e .   # or: pip install .

Quantization

python e2e/quant/regular-256-mix.py

Inference

python e2e/inference/regular-256-mix.py

📄 Citation

If you find this work useful, please consider citing:

@article{huang2025convrot,
  title={ConvRot: Rotation-Based Plug-and-Play 4-bit Quantization for Diffusion Transformers},
  author={Huang, Feice and Han, Zuliang and Zhou, Xing and Chen, Yihuang and Zhu, Lifei and Wang, Haoqian},
  journal={arXiv preprint arXiv:2512.03673},
  year={2025}
}

About

Official ConvRot implementation. A plug-and-play, convolution-like rotation module enabling efficient W4A4 quantization for diffusion models, achieving 4× memory savings and 2× faster inference with preserved image quality.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages