ATLAS-8B-Thinking

ATLAS-8B-Thinking is a specialized teacher model developed by Arc Intelligence, designed to solve the core reliability problem in reinforcement learning for LLMs. Standard RL fine-tuning is often brittle, leading to performance degradation where new skills are learned at the expense of old ones.

This model reframes the training process as one of effective pedagogy. Instead of just optimizing a student model, ATLAS-8B-Thinking first uses a lightweight diagnostic probe to assess the student's reasoning. Based on this diagnosis, it provides adaptive guidance—comprehensive help for struggling models and minimal intervention for capable ones. This "do no harm" approach ensures consistent capability improvement without the usual side effects of RL.

This model is a core component of the open-source ATLAS Framework and is designed to train and improve other language models.

Model Performance

The ATLAS framework, using this teacher model, produces the following improvements in a student model (Qwen3-4B) compared to the student baseline. The results highlight a rare combination of increased performance, higher efficiency, and fundamental reliability.

Metric	Improvement	Notes
Non-Degradation Rate	97%	Core metric showing reliability and avoidance of skill loss.
Average Accuracy	+15.7%	Across the Arc-ATLAS-Teach-v0 evaluation set.
Task Completion Rate	+31.2%	Student model completes tasks it previously failed.
Response Tokens	-37.2%	More efficient and concise reasoning.

How to Use

ATLAS-8B-Thinking is not a standard instruction-tuned model for direct chat. It is a core component of the ATLAS training framework, designed to interact with a "student" model in a two-pass process.

Loading the Model

Important: This model requires trust_remote_code=True due to custom Qwen3 architecture components.

  from transformers import AutoModelForCausalLM, AutoTokenizer

  # Load the teacher model
  teacher_model = AutoModelForCausalLM.from_pretrained(
      "Arc-Intelligence/ATLAS-8B-Thinking",
      trust_remote_code=True,  # Required for custom architecture
      torch_dtype=torch.bfloat16  # Recommended for efficiency
  )

  teacher_tokenizer = AutoTokenizer.from_pretrained(
      "Arc-Intelligence/ATLAS-8B-Thinking",
      trust_remote_code=True
  )

Conceptual Usage

The following is a simplified, conceptual example of the ATLAS interaction loop. The full implementation is available in the official repository.

# A conceptual example of the ATLAS interaction loop
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the teacher and a student model
teacher_model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-8B-Thinking")
student_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B") # The model to be improved

problem = "A farmer has 52 trees planted in a row over a length of 1850 meters. What is the distance between each tree?"

# 1. Teacher creates a diagnostic probe to assess the student's initial approach
# This step is abstracted in the actual framework
diagnostic_probe = "To find the distance between the trees, what is the first critical calculation you would make?"

# 2. Student responds to the probe
# (Implementation detail: you would get the student's response here)
student_reasoning_trace = "I would divide the total length (1850m) by the number of trees (52)."

# 3. Teacher assesses the trace and provides adaptive guidance
# The teacher recognizes this common off-by-one error.
# (Implementation detail: the teacher model generates this guidance)
adaptive_guidance = "Your approach is close. Remember that 52 trees create 51 intervals between them. The distance is uniform across these intervals."

# 4. The student uses the guidance to solve the problem
final_prompt = problem + "\n" + adaptive_guidance
# (Implementation detail: the student model generates the final answer)
final_answer = "1850 meters / 51 intervals = 36.27 meters per interval."

Running the Full Training Pipeline

To replicate our results or train your own models using the ATLAS framework, clone the official repository and follow the setup instructions.

# 1. Clone the repository
git clone [https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)
cd ATLAS

# 2. Install dependencies
bash scripts/install_py312.sh

# 3. Run training
# Phase 1: Supervised Fine-Tuning (SFT)
scripts/launch.sh 4 configs/run/teacher_sft.yaml

# Phase 2: Reinforcement Learning (RL)
scripts/launch_with_server.sh 1 3 configs/run/teacher_rcl.yaml

Training Details

Base Model: Qwen/Qwen3-8B
Training Framework: ATLAS (SFT → RL with GRPO)
Key Feature: The RL phase uses an asymmetric reward function that heavily penalizes any instance of student performance degradation, which is key to the framework's reliability.
Dataset: Arc-Intelligence/Arc-ATLAS-Teach-v0
Context Length: 8192 tokens

Citation

If you use the ATLAS framework or our models in your research, please cite our work:

@misc{barnes2025atlas,
      title={{ATLAS: Adaptive Teaching and Learning Alignment System for Reinforcement Learning}}, 
      author={Jarrod Barnes and Aman Jaglan},
      year={2025},
      publisher={Arc Intelligence},
      note={Technical Report},
      url={[https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)}
}

Project Resources

GitHub Repository: https://github.com/Arc-Computer/ATLAS
Companion Model: ATLAS-8B-Instruct
Training Dataset: Arc-ATLAS-Teach-v0

Downloads last month: 12

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for Arc-Intelligence/ATLAS-8B-Thinking

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

(520)

this model

Quantizations

3 models

Dataset used to train Arc-Intelligence/ATLAS-8B-Thinking

Evaluation results

Non-Degradation Rate on Arc-Intelligence/Arc-ATLAS-Teach-v0
self-reported

97%
Average Accuracy Improvement on Arc-Intelligence/Arc-ATLAS-Teach-v0
self-reported

+15.7%
Task Completion Rate Improvement on Arc-Intelligence/Arc-ATLAS-Teach-v0
self-reported

+31.2%
Response Token Reduction on Arc-Intelligence/Arc-ATLAS-Teach-v0
self-reported

-37.2%

View on Papers With Code