ATLAS-8B-Thinking
ATLAS-8B-Thinking is a specialized teacher model developed by Arc Intelligence, designed to solve the core reliability problem in reinforcement learning for LLMs. Standard RL fine-tuning is often brittle, leading to performance degradation where new skills are learned at the expense of old ones.
This model reframes the training process as one of effective pedagogy. Instead of just optimizing a student model, ATLAS-8B-Thinking first uses a lightweight diagnostic probe to assess the student's reasoning. Based on this diagnosis, it provides adaptive guidance—comprehensive help for struggling models and minimal intervention for capable ones. This "do no harm" approach ensures consistent capability improvement without the usual side effects of RL.
This model is a core component of the open-source ATLAS Framework and is designed to train and improve other language models.
Model Performance
The ATLAS framework, using this teacher model, produces the following improvements in a student model (Qwen3-4B) compared to the student baseline. The results highlight a rare combination of increased performance, higher efficiency, and fundamental reliability.
| Metric | Improvement | Notes |
|---|---|---|
| Non-Degradation Rate | 97% | Core metric showing reliability and avoidance of skill loss. |
| Average Accuracy | +15.7% | Across the Arc-ATLAS-Teach-v0 evaluation set. |
| Task Completion Rate | +31.2% | Student model completes tasks it previously failed. |
| Response Tokens | -37.2% | More efficient and concise reasoning. |
How to Use
ATLAS-8B-Thinking is not a standard instruction-tuned model for direct chat. It is a core component of the ATLAS training framework, designed to interact with a "student" model in a two-pass process.
Loading the Model
Important: This model requires trust_remote_code=True due to custom Qwen3 architecture components.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the teacher model
teacher_model = AutoModelForCausalLM.from_pretrained(
"Arc-Intelligence/ATLAS-8B-Thinking",
trust_remote_code=True, # Required for custom architecture
torch_dtype=torch.bfloat16 # Recommended for efficiency
)
teacher_tokenizer = AutoTokenizer.from_pretrained(
"Arc-Intelligence/ATLAS-8B-Thinking",
trust_remote_code=True
)
Conceptual Usage
The following is a simplified, conceptual example of the ATLAS interaction loop. The full implementation is available in the official repository.
# A conceptual example of the ATLAS interaction loop
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the teacher and a student model
teacher_model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-8B-Thinking")
student_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B") # The model to be improved
problem = "A farmer has 52 trees planted in a row over a length of 1850 meters. What is the distance between each tree?"
# 1. Teacher creates a diagnostic probe to assess the student's initial approach
# This step is abstracted in the actual framework
diagnostic_probe = "To find the distance between the trees, what is the first critical calculation you would make?"
# 2. Student responds to the probe
# (Implementation detail: you would get the student's response here)
student_reasoning_trace = "I would divide the total length (1850m) by the number of trees (52)."
# 3. Teacher assesses the trace and provides adaptive guidance
# The teacher recognizes this common off-by-one error.
# (Implementation detail: the teacher model generates this guidance)
adaptive_guidance = "Your approach is close. Remember that 52 trees create 51 intervals between them. The distance is uniform across these intervals."
# 4. The student uses the guidance to solve the problem
final_prompt = problem + "\n" + adaptive_guidance
# (Implementation detail: the student model generates the final answer)
final_answer = "1850 meters / 51 intervals = 36.27 meters per interval."
Running the Full Training Pipeline
To replicate our results or train your own models using the ATLAS framework, clone the official repository and follow the setup instructions.
# 1. Clone the repository
git clone [https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)
cd ATLAS
# 2. Install dependencies
bash scripts/install_py312.sh
# 3. Run training
# Phase 1: Supervised Fine-Tuning (SFT)
scripts/launch.sh 4 configs/run/teacher_sft.yaml
# Phase 2: Reinforcement Learning (RL)
scripts/launch_with_server.sh 1 3 configs/run/teacher_rcl.yaml
Training Details
- Base Model: Qwen/Qwen3-8B
- Training Framework: ATLAS (SFT → RL with GRPO)
- Key Feature: The RL phase uses an asymmetric reward function that heavily penalizes any instance of student performance degradation, which is key to the framework's reliability.
- Dataset: Arc-Intelligence/Arc-ATLAS-Teach-v0
- Context Length: 8192 tokens
Citation
If you use the ATLAS framework or our models in your research, please cite our work:
@misc{barnes2025atlas,
title={{ATLAS: Adaptive Teaching and Learning Alignment System for Reinforcement Learning}},
author={Jarrod Barnes and Aman Jaglan},
year={2025},
publisher={Arc Intelligence},
note={Technical Report},
url={[https://github.com/Arc-Computer/ATLAS](https://github.com/Arc-Computer/ATLAS)}
}
Project Resources
- GitHub Repository: https://github.com/Arc-Computer/ATLAS
- Companion Model: ATLAS-8B-Instruct
- Training Dataset: Arc-ATLAS-Teach-v0
- Downloads last month
- 12
Model tree for Arc-Intelligence/ATLAS-8B-Thinking
Dataset used to train Arc-Intelligence/ATLAS-8B-Thinking
Evaluation results
- Non-Degradation Rate on Arc-Intelligence/Arc-ATLAS-Teach-v0self-reported97%
- Average Accuracy Improvement on Arc-Intelligence/Arc-ATLAS-Teach-v0self-reported+15.7%
- Task Completion Rate Improvement on Arc-Intelligence/Arc-ATLAS-Teach-v0self-reported+31.2%
- Response Token Reduction on Arc-Intelligence/Arc-ATLAS-Teach-v0self-reported-37.2%

