Edit Models filters

Apps

Inference Providers

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

7,065

Full-text search

Active filters: gptq

rinna/llama-3-youko-8b-gptq

Text Generation • 2B • Updated Mar 23 • 37

rinna/llama-3-youko-8b-instruct-gptq

Text Generation • 2B • Updated Mar 23 • 7 • 1

rinna/llama-3-youko-70b-gptq

Text Generation • 11B • Updated Mar 23 • 4

rinna/llama-3-youko-70b-instruct-gptq

Text Generation • 11B • Updated Mar 23 • 1

Xu-Ouyang/pythia-1.4b-deduped-int3-step14000-GPTQ-wikitext2

Text Generation • 0.3B • Updated Jul 21, 2024

Xu-Ouyang/pythia-1.4b-deduped-int3-step29000-GPTQ-wikitext2

Text Generation • 0.3B • Updated Jul 21, 2024 • 1

Xu-Ouyang/pythia-1.4b-deduped-int3-step43000-GPTQ-wikitext2

Text Generation • 0.3B • Updated Jul 22, 2024

Xu-Ouyang/pythia-1.4b-deduped-int3-step57000-GPTQ-wikitext2

Text Generation • 0.3B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w2g128-GPTQ

Text Generation • 1B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w2g128-BitBLAS

Text Generation • 4B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64-BitBLAS

Text Generation • 4B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64-GPTQ

Text Generation • 1B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w4g128-BitBLAS

Text Generation • 7B • Updated Jul 22, 2024

Xu-Ouyang/pythia-2.8b-deduped-int4-step129000-GPTQ-wikitext2

Text Generation • 0.6B • Updated Jul 22, 2024

ChenMnZ/Llama-2-13b-EfficientQAT-w4g128-GPTQ

Text Generation • 2B • Updated Jul 22, 2024

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128-BitBLAS

Text Generation • 18B • Updated Jul 22, 2024 • 10

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128-GPTQ

Text Generation • 5B • Updated Jul 22, 2024

ChenMnZ/Llama-2-70b-EfficientQAT-w2g64-GPTQ

Text Generation • 6B • Updated Jul 22, 2024

ChenMnZ/Llama-2-70b-EfficientQAT-w4g128-BitBLAS

Text Generation • 36B • Updated Jul 22, 2024

ChenMnZ/Llama-2-70b-EfficientQAT-w4g128-GPTQ

Text Generation • 10B • Updated Jul 22, 2024

Xu-Ouyang/pythia-2.8b-deduped-int3-step14000-GPTQ-wikitext2

Text Generation • 0.5B • Updated Jul 22, 2024

Xu-Ouyang/pythia-12b-deduped-int3-step14000-GPTQ-wikitext2

Text Generation • 2B • Updated Jul 22, 2024

ChenMnZ/Llama-2-7b-EfficientQAT-w2g128-GPTQ

Text Generation • 0.7B • Updated Jul 22, 2024 • 1

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64-GPTQ

Text Generation • 0.8B • Updated Jul 22, 2024 • 1 • 1

Xu-Ouyang/pythia-2.8b-deduped-int3-step29000-GPTQ-wikitext2

Text Generation • 0.5B • Updated Jul 22, 2024

ModelCloud/gemma-2-27b-it-gptq-4bit

Text Generation • 6B • Updated Jul 23, 2024 • 10 • 12

ChenMnZ/Llama-2-7b-EfficientQAT-w4g128-GPTQ

Text Generation • 1B • Updated Jul 22, 2024

ChenMnZ/Llama-3-70b-EfficientQAT-w2g128-GPTQ

Text Generation • 7B • Updated Jul 22, 2024 • 2

ChenMnZ/Llama-3-70b-EfficientQAT-w2g64-GPTQ

Text Generation • 8B • Updated Jul 22, 2024 • 1

ChenMnZ/Llama-3-70b-EfficientQAT-w4g128-GPTQ

Text Generation • 11B • Updated Jul 22, 2024