Llama-3.1-8B-DeepSeek-Distilled

Model Overview

This model is distilled from deepseek-ai/deepseek-llm-67b-base, a 67B parameter model available on Hugging Face. It is based on the meta-llama/Llama-3.1-8B architecture.

Evaluation Scores

  • XTREME - XNLI - en: Accuracy: 0.620
  • SuperGLUE - BoolQ: Accuracy: 0.848
  • GLUE - SST-2: Accuracy: 0.938
  • SQuAD:
    • Exact Match: 71.8
    • F1 Score: 84.7

Usage

Load the model and tokenizer from the Hugging Face Hub:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("enesarda22/Llama-3.1-8B-DeepSeek67B-Distilled")
tokenizer = AutoTokenizer.from_pretrained("enesarda22/Llama-3.1-8B-DeepSeek67B-Distilled")
Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train enesarda22/Llama-3.1-8B-DeepSeek67B-Distilled