Fairy2i-W2

🔗 Links

Paper GitHub ModelScope

Abstract

Large language models (LLMs) have revolutionized artificial intelligence, yet their massive memory and computational demands necessitate aggressive quantization, increasingly pushing representations toward the theoretical limit of a single bit. While complex-valued LLMs, such as iFairy, offer a superior chance for low-bit representation compared to real-valued counterparts, they require training from scratch, preventing the utilization of the vast ecosystem of pre-trained real-valued foundation models.

Here we present Fairy2i, a universal framework that transforms pre-trained real-valued layers into an equivalent widely-linear complex form, enabling extremely low-bit quantization while reusing existing checkpoints. By proving a lossless mathematical equivalence between real and widely-linear maps, we convert standard Transformers into the complex domain and employ a phase-aware quantization scheme with a highly efficient codebook of fourth roots of unity ({±1, ±i}). Furthermore, we introduce a recursive residual quantization mechanism that iteratively minimizes quantization error, allowing inference to proceed via efficient multiplication-free accumulation.

We demonstrate that Fairy2i-W2 restores the performance of LLaMA-2 7B at an effective 2-bit precision to levels nearly comparable with full-precision baselines, significantly outperforming state-of-the-art real-valued binary and ternary quantization methods.

This work bridges the gap between the representational efficiency of complex-valued arithmetic and the practical utility of pre-trained models, paving a new way for efficient inference on commodity hardware.

Method

Fairy2i-W2 consists of three key components:

Widely-Linear Transformation

We transform pre-trained real-valued linear layers into an equivalent widely-linear complex form without altering the model's behavior. Each real linear layer R (a real matrix of size 2n×2m) is reparameterized into two complex matrices U and W (each of size n×m) such that y = Ux + Wx̅, where x̅ denotes the complex conjugate of x. This transformation is lossless and unique, preserving the original forward computation before quantization.

Phase-Aware Complex Quantization

We quantize complex weights using a phase-based scheme with the codebook {±1, ±i} (fourth roots of unity). For each complex weight, we project it to the nearest codeword by angle and apply axis-wise scaling factors. During QAT training, we maintain full-precision master weights and use quantized copies in the forward pass with straight-through estimator (STE) gradients.

Recursive Residual Quantization

To further reduce quantization error, we recursively quantize the residual error. Each complex weight is represented as a sum of low-bit terms: W_q ≈ Σ W^(t) (sum over t from 0 to T-1), where each term is quantized using the same phase-aware mechanism. For Fairy2i-W2 (T=2), we use 2 recursive stages, achieving an effective 2 bits per real parameter.

Evaluation

Main Results on LLaMA-2 7B

Method Bits C4 PPL↓ ARC-e ARC-c HellaSwag PIQA Winogrande Avg.
LLaMA-2 (FP16) 16 6.63 75.59 43.17 57.06 77.91 69.85 64.72
Fairy2i-W2 2 7.85 72.73 39.76 53.33 76.17 68.03 62.00
AQLM 2 8.54 63.68 32.76 49.55 74.76 65.67 57.28
QuIP# 2 11.01 55.56 28.84 42.94 71.38 62.43 52.23
Real-Ternary (QAT) 1.58 11.06 55.93 24.15 38.43 69.80 55.17 48.70
Fairy2i-W1 1 11.03 56.56 24.82 38.19 70.08 53.67 48.66
Real-Binary (QAT) 1 11.75 53.32 22.70 35.57 66.81 52.64 46.21
GPTQ 3 10.61 58.46 31.06 45.21 71.49 59.19 53.08

Key Results:

  • Fairy2i-W2 (2-bit) achieves a perplexity of 7.85, closing the gap to FP16 (6.63) while outperforming all 2-bit PTQ methods
  • Fairy2i-W2 achieves 62.00% average accuracy on zero-shot tasks, highly competitive with FP16 (64.72%)
  • Fairy2i-W1 (1-bit) outperforms real-valued binary and ternary baselines at the same or lower bit budgets

Quick Start

Fairy2i-W2 is based on LLaMA-2 7B architecture, with only the linear layers replaced by complex-valued QAT layers. The model structure is otherwise identical to LLaMA-2.

Installation

pip install torch transformers safetensors huggingface_hub

Loading the Model

Please refer to load_model.py for detailed implementation. Basic usage:

from load_model import load_model

# Load Fairy2i-W2 model
model, tokenizer = load_model()

# The model is ready to use!
prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        do_sample=True,
        temperature=0.7
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Model Details

  • Base Model: LLaMA-2 7B
  • Quantization Method: Complex-Phase V2 (2-step recursive residual quantization)
  • Effective Bit Width: 2 bits per real parameter
  • Codebook: {±1, ±i} (fourth roots of unity)
  • Training: QAT (Quantization-Aware Training) on 30B tokens from RedPajama dataset

Files in Repository

  • load_model.py: Model loading script
  • qat_modules.py: QAT linear layer implementations
  • quantization.py: Quantization functions (PhaseQuant, BitNet, etc.)
  • config.json: Model configuration (identical to LLaMA-2 7B)
  • model.safetensors.index.json: Weight file index
  • model-0000X-of-00003.safetensors: Sharded model weights
  • Tokenizer files: tokenizer.json, tokenizer_config.json, etc.

Citation

If you use Fairy2i-W2 in your research, please cite:

@article{wang2025fairy2i,
  title={Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in {±1, ±i}},
  author={Wang, Feiyu and Tan, Xinyu and Huang, Bokai and Zhang, Yihao and Wang, Guoan and Cong, Peizhuang and Yang, Tong},
  journal={arXiv preprint},
  year={2025}
}

License

This model follows the same license as LLaMA-2. Please refer to the original LLaMA-2 license for details.

Contact

For questions or issues, please contact: [email protected]

Downloads last month
44
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PKU-DS-LAB/Fairy2i-W2

Finetuned
(1088)
this model
Quantizations
2 models