2000-Sample dataset needed for NVFP4 quantization
Request for 2000 Samples from JANUSCODE-800K Visual-Programmatic Tasks
Request for 2000 Samples from JANUSCODE-800K Visual-Programmatic Tasks
Hi @InternLM team,
I'm working on an open-source guide for deploying quantized versions of JanusCoderV-8B on Blackwell GPUs (e.g., RTX 5060 Ti via WSL2), focusing on NVFP4 Post-Training Quantization for 2-3x inference speedup with <0.5% accuracy drift on visual-programmatic tasks like chart/UI generation from screenshots/instructions.
To calibrate the quantization optimally using TensorRT-LLM, I need ~2000 high-quality samples from the visual-programmatic subset of JANUSCODE-800K (prioritizing ~50% viz-algo, 30% UI-artifact, 20% scientific code). The current Google Drive samples are a great start but too limited for robust PTQ.
Could you share these via a gated HF dataset, Drive link?
Here is a link to a Google Doc with Jupyter Notebook JSON code to create notebook that will extract the ideal 2000 sample dataset for NVFP4 Quantization
https://docs.google.com/document/d/1odl5ek3EbFrta71dgMlyvXEwAdK6pbarVMi8rJttaH8/edit?usp=drivesdk