--- license: other library_name: diffusers pipeline_tag: text-to-video tags: - wan - text-to-video - image-generation - video-generation - lora - adapter - cfg-distillation - step-distillation - lightx2v - diffusion - video-diffusion - bf16 - t2v --- # WAN LightX2V T2V LoRA Adapters (720p - All Ranks) Complete collection of LoRA (Low-Rank Adaptation) adapters for the LightX2V 14B text-to-video generation model at 720p resolution. This repository contains all 7 rank variants (4, 8, 16, 32, 64, 128, 256) enabling flexible quality/performance trade-offs through CFG (Classifier-Free Guidance) step distillation. ## 📋 Model Description These LoRA adapters enable efficient text-to-video generation at 720p resolution (1280x720) using the powerful LightX2V T2V 14B base model. Through CFG step distillation, these adapters achieve 2-3x faster generation while maintaining high quality output. The complete rank collection (4-256) provides flexibility to optimize for speed, quality, or VRAM constraints. **Key Features**: - 7 complete rank variants for flexible deployment - CFG step distillation v2 for faster inference (15-25 steps vs 50-100) - BF16 precision for stability and hardware optimization - 720p native resolution (1280x720) - Compatible with Diffusers and ComfyUI workflows ## 📁 Repository Contents This repository contains **7 LoRA adapter models** totaling **~4.7GB**: ``` wan21-lightx2v-t2v-14b-720p/ └── loras/ └── wan/ ├── wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank4-bf16.safetensors (45MB) ├── wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank8-bf16.safetensors (82MB) ├── wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank16-bf16.safetensors (156MB) ├── wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank32-bf16.safetensors (305MB) ├── wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank64-bf16.safetensors (602MB) ├── wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank128-bf16.safetensors (1.2GB) └── wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank256-bf16.safetensors (2.4GB) ``` **File Sizes**: - Total repository size: ~4.7GB - Individual adapters: 45MB to 2.4GB - Recommended adapter (rank-32): 305MB ## 💻 Hardware Requirements ### Minimum Requirements (Rank 4-16) - **GPU**: NVIDIA RTX 3060 (12GB VRAM) or equivalent AMD - **System RAM**: 16GB DDR4 - **Storage**: 500MB free space (individual adapter) + base model - **OS**: Windows 10/11, Linux (Ubuntu 20.04+), macOS 12+ - **Architecture**: NVIDIA Ampere or newer (BF16 support) ### Recommended (Rank 32-64) ⭐ - **GPU**: NVIDIA RTX 4070 Ti (16GB VRAM) or RTX 3090 (24GB VRAM) - **System RAM**: 32GB DDR4/DDR5 - **Storage**: 1GB free space + base model (~30GB) - **CUDA**: 11.8+ or 12.1+ - **OS**: Windows 11 or Linux (Ubuntu 22.04+) ### High-End (Rank 128-256) - **GPU**: NVIDIA RTX 4090 (24GB VRAM) or A100 (40GB VRAM) - **System RAM**: 64GB DDR5 - **Storage**: 5GB free space (all adapters) + base model - **Use Case**: Maximum quality research/production work ### VRAM Usage by Rank (720p, 24 frames) - **Rank 4-8**: ~14-15GB VRAM - **Rank 16-32**: ~15-16GB VRAM (recommended) - **Rank 64**: ~18GB VRAM - **Rank 128**: ~20GB VRAM - **Rank 256**: ~24GB VRAM (requires RTX 4090 or better) ## 🚀 Usage Examples ### Basic Text-to-Video Generation (Diffusers) ```python from diffusers import DiffusionPipeline import torch # Load base LightX2V T2V 14B model pipe = DiffusionPipeline.from_pretrained( "lightx2v/lightx2v-t2v-14b", torch_dtype=torch.bfloat16, device_map="auto" ) # Load LoRA adapter (rank-32 recommended for balanced quality/speed) pipe.load_lora_weights( "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank32-bf16.safetensors" ) # Generate 720p video from text prompt prompt = "A serene mountain landscape at sunset with golden light, cinematic camera movement, 720p HD quality" video = pipe( prompt=prompt, num_inference_steps=20, # Reduced steps thanks to distillation guidance_scale=7.5, num_frames=24, # ~3 seconds at 8 fps height=720, width=1280 ).frames # Export video file from diffusers.utils import export_to_video export_to_video(video, "output_720p.mp4", fps=8) ``` ### Rank Selection and Comparison ```python import os # Base path to LoRA adapters LORA_PATH = "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan" # Select rank based on your hardware and quality needs # Options: 4, 8, 16, 32, 64, 128, 256 rank = 32 # Recommended starting point lora_file = f"{LORA_PATH}/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank{rank}-bf16.safetensors" pipe.load_lora_weights(lora_file) # Generate video video = pipe( prompt="Aerial drone shot rising above misty forest at sunrise, cinematic 720p quality", num_inference_steps=20, num_frames=24 ).frames export_to_video(video, f"output_rank{rank}.mp4", fps=8) ``` ### Testing Multiple Ranks ```python # Compare different ranks to find optimal balance for your use case ranks_to_test = [16, 32, 64, 128] for rank in ranks_to_test: print(f"Testing rank {rank}...") lora_file = f"{LORA_PATH}/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank{rank}-bf16.safetensors" pipe.load_lora_weights(lora_file) video = pipe( prompt="Lightning storm over desert landscape, dramatic clouds, cinematic 720p", num_inference_steps=20, num_frames=24 ).frames export_to_video(video, f"comparison_rank{rank}.mp4", fps=8) ``` ### Memory-Efficient Loading ```python # For systems with limited VRAM pipe = DiffusionPipeline.from_pretrained( "lightx2v/lightx2v-t2v-14b", torch_dtype=torch.bfloat16, ) # Enable CPU offloading to reduce VRAM usage pipe.enable_model_cpu_offload() # Use lower rank for minimal VRAM pipe.load_lora_weights( "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/wan21-lightx2v-t2v-14b-cfg-step-distill-v2-rank16-bf16.safetensors" ) # Generate with reduced frames/resolution if needed video = pipe( prompt="City street at night with neon lights, 720p quality", num_frames=16, # Reduced from 24 height=720, width=1280 ).frames ``` ### ComfyUI Integration 1. **Copy LoRA to ComfyUI**: ``` ComfyUI/models/loras/wan/ └── wan21-lightx2v-t2v-rank32-bf16.safetensors ``` 2. **Workflow Setup**: - Add "Load LoRA" node - Select adapter: `wan21-lightx2v-t2v-rank32-bf16.safetensors` - Set LoRA strength: **0.8-1.0** - Connect to LightX2V T2V model nodes - Set resolution: **1280x720** (720p) 3. **Recommended Parameters**: - Steps: 15-25 (distilled model) - CFG Scale: 6.0-8.0 - LoRA Strength: 0.8-1.0 - Resolution: 1280x720 (native) ## 📊 Model Specifications | Specification | Details | |---------------|---------| | **Model Type** | LoRA Adapters for Video Diffusion | | **Architecture** | Low-Rank Adaptation (LoRA) | | **Base Model** | LightX2V T2V 14B | | **Training Method** | CFG Step Distillation v2 | | **Precision** | BF16 (Brain Floating Point 16) | | **Resolution** | 720p (1280x720) native | | **Rank Variants** | 4, 8, 16, 32, 64, 128, 256 (complete set) | | **Parameter Count** | 4M to 256M (varies by rank) | | **File Format** | .safetensors (secure tensor storage) | | **Total Size** | ~4.7GB (all 7 adapters) | | **Pipeline** | Text-to-Video (T2V) | | **Framework** | Diffusers, ComfyUI compatible | ### Rank Selection Guide | Rank | Size | Quality | Speed | VRAM | Best For | |------|------|---------|-------|------|----------| | **4** | 45MB | Basic | Fastest | 14GB | Prototyping, minimal hardware | | **8** | 82MB | Good | Very Fast | 14GB | Quick testing, low VRAM | | **16** | 156MB | Better | Fast | 15GB | Balanced efficiency | | **32** ⭐ | 305MB | High | Moderate | 16GB | **Production (recommended)** | | **64** | 602MB | Very High | Slower | 18GB | Quality-focused work | | **128** | 1.2GB | Excellent | Slow | 20GB | High-fidelity output | | **256** | 2.4GB | Maximum | Slowest | 24GB | Research, maximum quality | **Recommendation**: Start with **rank-32** for optimal quality/performance balance. Scale up (64/128/256) for maximum quality or down (16/8/4) for speed and resource constraints. ## ⚡ Performance Tips and Optimization ### Speed Optimization ```python # 1. Use lower ranks for faster generation pipe.load_lora_weights("...rank16-bf16.safetensors") # 2. Reduce inference steps (distilled model enables this) video = pipe(prompt, num_inference_steps=15) # Instead of 20-25 # 3. Enable torch.compile() for PyTorch 2.0+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) # 4. Reduce frame count for faster iteration video = pipe(prompt, num_frames=16) # ~2 seconds instead of 3 # 5. Use mixed precision torch.set_float32_matmul_precision('high') ``` ### Quality Optimization ```python # 1. Use higher ranks for maximum quality pipe.load_lora_weights("...rank128-bf16.safetensors") # 2. Increase inference steps video = pipe(prompt, num_inference_steps=25) # 3. Tune CFG scale for your prompt video = pipe(prompt, guidance_scale=7.5) # 6.5-8.0 range # 4. Add quality keywords to prompt prompt = "A majestic eagle soaring, cinematic camera movement, 720p HD quality, professional cinematography" # 5. Generate multiple candidates and select best for i in range(3): video = pipe(prompt).frames export_to_video(video, f"candidate_{i}.mp4", fps=8) ``` ### Memory Optimization ```python # 1. Enable CPU offloading pipe.enable_model_cpu_offload() # 2. Use sequential CPU offload for extreme constraints pipe.enable_sequential_cpu_offload() # 3. Lower rank selection pipe.load_lora_weights("...rank8-bf16.safetensors") # 4. Clear cache between generations torch.cuda.empty_cache() # 5. Use attention slicing pipe.enable_attention_slicing() ``` ### CFG Step Distillation Benefits - **Faster inference**: 15-25 steps vs 50-100 (2-3x speedup) - **Maintained quality**: Distillation preserves output fidelity - **Better guidance**: Optimized CFG behavior for prompt adherence - **Consistency**: More stable across different CFG scale values - **Lower cost**: Reduced compute requirements per generation ## 🎨 Prompting Best Practices ### Text-to-Video (T2V) Prompting **Essential Elements**: 1. **Subject**: Clear description of main content 2. **Camera movement**: Specify motion style and direction 3. **Lighting/atmosphere**: Time of day, mood, lighting quality 4. **Quality modifiers**: Include "720p", "HD", "cinematic" 5. **Temporal dynamics**: Motion speed, transitions ### Example Prompts for 720p ``` "A majestic eagle soaring through mountain valleys at golden hour, cinematic camera movement following the bird, 720p HD quality, professional wildlife cinematography" "City street time-lapse with traffic flowing, neon lights reflecting on wet pavement, camera slowly panning right, high detail, 720p resolution, urban cinematography" "Underwater coral reef with tropical fish swimming, gentle camera movement, clear blue water, sunlight filtering from above, smooth motion, cinematic 720p quality" "Drone shot rising above a misty forest at sunrise, rays of light breaking through trees, smooth camera ascent, aerial cinematography, HD quality 720p" "Lightning storm over desert landscape, dramatic clouds, time-lapse motion, cinematic wide shot, 720p quality, epic natural phenomenon" "Cherry blossom petals falling in slow motion, gentle breeze, soft pink lighting, camera tracking downward, beautiful spring scene, 720p HD quality" ``` ### Camera Movement Keywords - **Basic**: "camera pans left/right", "camera tilts up/down" - **Dynamic**: "dolly zoom", "tracking shot", "crane shot", "steadicam" - **Aerial**: "drone shot", "aerial view", "bird's eye view", "flyover" - **Complex**: "orbit around subject", "slow push-in", "reveal shot" ### Temporal Keywords - **Speed**: "slow motion", "time-lapse", "real-time", "gradual" - **Transitions**: "smooth transition", "gradual change", "progressive" - **Motion**: "gentle movement", "dynamic action", "flowing motion" ### Quality Modifiers - "720p HD quality", "high detail", "cinematic", "professional" - "crisp", "clear", "sharp focus", "high fidelity" - "broadcast quality", "production grade" ## 🔧 Troubleshooting ### Out of Memory (OOM) Errors **Solutions**: ```python # 1. Use lower rank adapter pipe.load_lora_weights("...rank16-bf16.safetensors") # or rank8, rank4 # 2. Enable CPU offloading pipe.enable_model_cpu_offload() # 3. Reduce frame count video = pipe(prompt, num_frames=16) # Instead of 24 # 4. Enable attention slicing pipe.enable_attention_slicing() # 5. Use sequential CPU offload (extreme cases) pipe.enable_sequential_cpu_offload() # 6. Clear CUDA cache between generations import torch torch.cuda.empty_cache() ``` ### Poor Quality Results **Diagnose and Fix**: - **Issue**: Blurry or low-detail output - **Solution**: Increase rank (try 64, 128, or 256) - **Solution**: Add "720p HD quality, high detail" to prompt - **Issue**: Inconsistent motion or artifacts - **Solution**: Adjust CFG scale (try 6.5-8.0 range) - **Solution**: Increase inference steps to 25 - **Issue**: Poor prompt adherence - **Solution**: Increase guidance_scale to 8.0 - **Solution**: Make prompt more specific and descriptive - **Issue**: Wrong resolution output - **Solution**: Explicitly set height=720, width=1280 ### Slow Generation Speed **Optimize Performance**: ```python # Use lower ranks pipe.load_lora_weights("...rank4-bf16.safetensors") # Fastest # Reduce steps (distillation enables this) video = pipe(prompt, num_inference_steps=15) # Fewer frames video = pipe(prompt, num_frames=16) # Enable torch.compile (PyTorch 2.0+) pipe.unet = torch.compile(pipe.unet) # Use xformers memory efficient attention pipe.enable_xformers_memory_efficient_attention() ``` ### Model Loading Errors **Common Issues**: ```python # Issue: "File not found" # Solution: Use absolute paths with forward slashes or raw strings lora_path = r"E:\huggingface\wan21-lightx2v-t2v-14b-720p\loras\wan\..." # or lora_path = "E:/huggingface/wan21-lightx2v-t2v-14b-720p/loras/wan/..." # Issue: "BF16 not supported" # Solution: Check GPU architecture (requires Ampere or newer) # Fallback to FP16 if needed: pipe = DiffusionPipeline.from_pretrained( "lightx2v/lightx2v-t2v-14b", torch_dtype=torch.float16 # Instead of bfloat16 ) # Issue: "CUDA out of memory on load" # Solution: Use CPU offloading before loading pipe.enable_model_cpu_offload() ``` ## 📄 License These LoRA adapters follow the license terms of the LightX2V base model. Please review the base model license for usage restrictions: - **Base Model**: LightX2V T2V 14B - **License**: See https://huggingface.co/lightx2v for complete terms **Important**: Verify license compliance for your intended use case (commercial, research, etc.) with the base model license. ## 📖 Citation If you use these LoRA adapters in your research or projects, please cite: ```bibtex @software{wan21_lightx2v_t2v_lora_720p, title={WAN LightX2V T2V LoRA Adapters for 720p Video Generation}, author={WAN Team}, year={2024}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/wan21-lightx2v-t2v-14b-720p}}, note={CFG Step Distillation LoRA adapters (ranks 4-256) for LightX2V T2V 14B} } @software{lightx2v_base_model, title={LightX2V: Text-to-Video Generation Model}, author={LightX2V Team}, year={2024}, publisher={Hugging Face}, howpublished={\url{https://huggingface.co/lightx2v}} } ``` ## 🔗 Related Resources - **Base Model**: [LightX2V T2V 14B](https://huggingface.co/lightx2v/lightx2v-t2v-14b) - **480p I2V LoRAs**: wan21-lightx2v-i2v-14b-480p (image-to-video) - **WAN Models**: WAN 2.1 and WAN 2.2 video generation models - **Diffusers Documentation**: https://huggingface.co/docs/diffusers - **Model Cards Guide**: https://huggingface.co/docs/hub/model-cards ## 🙏 Acknowledgments - **LightX2V Team** for the exceptional T2V 14B base model - **WAN Team** for LoRA adapter development and CFG distillation - **Hugging Face** for hosting infrastructure and diffusers library - **Community contributors** for testing, feedback, and improvements ## 📧 Support and Contact For issues or questions: - **Model-specific issues**: Open an issue in this repository - **Base model questions**: See LightX2V documentation - **Technical support**: Diffusers GitHub issues --- ## 📋 Summary **Complete 720p T2V LoRA Collection**: - ✅ **7 rank variants**: 4, 8, 16, 32, 64, 128, 256 (complete set) - ✅ **Total size**: ~4.7GB (all adapters included) - ✅ **Resolution**: 720p (1280x720) native - ✅ **Precision**: BF16 for stability and performance - ✅ **Speed**: 2-3x faster than non-distilled (15-25 steps) - ✅ **Flexibility**: Choose rank for quality/speed/VRAM optimization - ✅ **Recommended**: Rank-32 (305MB) for balanced production use - ✅ **Framework**: Compatible with Diffusers and ComfyUI **Key Advantages**: - Complete rank collection from minimal (45MB) to maximum (2.4GB) - CFG step distillation for efficient generation - Native 720p resolution for HD video output - Flexible deployment across different hardware configurations - Production-ready with comprehensive documentation --- **Last Updated**: October 2024 **Repository Version**: v1.1 **Model Version**: CFG Step Distillation v2 **Total Repository Size**: ~4.7GB (7 adapters) **Recommended Rank**: 32 (305MB, 16GB VRAM) **Primary Use Case**: Text-to-video generation at 720p with flexible quality/performance trade-offs