You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ“ Overview

Tensordyne builds advanced AI-inference systems, enabling faster, more affordable, and sustainable generative AI.

This repository provides resources to quickly get started with Mixtral-8x22B on the Tensordyne Inference System and its SDK.

🧩 Model Details

  • Quantization: post-training quantization of the base model, no fine-tuning or additional training was performed
  • Supported data types: Tensordyne FP16 (tFP16), Tensordyne FP8 (tFP8), mixed-precision

βš™οΈ Quantization

The Tensordyne SDK offers multiple post-training quantization strategies to convert AI models for efficient inference on the Tensordyne Inference System β€” fully customizable for your optimization targets.
We showcase several preselected quantization variants that can be applied on-the-fly to quantize to Tensordyne data types here. The calibration-based strategies are defined by quantization configurations provided as .json.

The quantized models are evaluated on 10% of the WikiText-2 raw v1 test set. Negative relative perplexity drops indicate that the model performs better than the float base model.

Model Configuration Absolute Perplexity Relative Perplexity Drop vs. BF16 Details
BF16 2.923 – The baseline model trained in BF16
calibration_free_tFP16 2.921 -0.05 % calibration-free tFP16 quantization
calibration_based_tFP16 2.923 0.00 % calibration-based tFP16 quantization
layerwise_mixed_precision 2.932 0.30 % calibration-based mixed-precision: tFP8, outliers in tFP16
calibration_free_dynamic_tFP8 2.926 0.13 % calibration-free tFP8 dynamic quantization

πŸš€ Getting Started

Refer to the Tensordyne Hugging Face Hub tutorial for instructions on using the artifacts provided in this repository.
Our hosted documentation provides more information on Tensordyne's quantization strategies and introduces you to our SDK.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Tensordyne/Mixtral-8x22B

Quantized
(3)
this model