Helion V1.5
Collection
Series of the Helion V1.5
•
2 items
•
Updated
•
2
Helion-V1.5 is a 7B parameter conversational AI model fine-tuned from Llama-2 using QLoRA. It delivers improved performance over Helion-V1 with enhanced instruction following, code generation, and multi-turn dialogue capabilities.
Architecture: Llama-2-7B with LoRA adapters
Parameters: 7 billion (base) + 67M (LoRA)
Context Length: 4096 tokens
Training: QLoRA (4-bit) fine-tuning on high-quality instruction data
License: Apache 2.0
| Feature | Helion-V1 | Helion-V1.5 | Improvement |
|---|---|---|---|
| MT-Bench Score | 6.8 | 7.2 | +5.9% |
| AlpacaEval Win Rate | 72.3% | 78.5% | +8.6% |
| HumanEval Pass@1 | 38.1% | 42.3% | +11.0% |
| Avg Response Time | 2.3s | 1.8s | -21.7% |
| Function Calling | ❌ | ✅ | New |
| Streaming Support | Basic | Full | Enhanced |
| Component | Value |
|---|---|
| Hidden Size | 4096 |
| Layers | 32 |
| Attention Heads | 32 |
| Intermediate Size | 11008 |
| Vocabulary | 32000 tokens |
| Position Encoding | RoPE |
| Precision | bfloat16 |
LoRA Configuration:
| Benchmark | Score | Category |
|---|---|---|
| MT-Bench | 7.2/10 | Multi-turn conversation |
| AlpacaEval | 78.5% | Instruction following |
| HumanEval | 42.3% | Code generation |
| GSM8K | 35.7% | Mathematical reasoning |
| TruthfulQA | 51.2% | Factual accuracy |
| MMLU | 48.9% | Knowledge |
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "DeepXR/Helion-V1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Prepare messages
messages = [
{"role": "user", "content": "Explain machine learning in simple terms"}
]
# Apply chat template
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate response
output = model.generate(
input_ids,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
docker run --gpus all --shm-size 1g -p 8080:80 \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id DeepXR/Helion-V1.5 \
--max-input-length 3584 \
--max-total-tokens 4096
from vllm import LLM, SamplingParams
llm = LLM(model="DeepXR/Helion-V1.5")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)
prompts = ["Explain quantum computing"]
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(output.outputs[0].text)
from langchain.llms import HuggingFacePipeline
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="DeepXR/Helion-V1.5",
max_new_tokens=512
)
llm = HuggingFacePipeline(pipeline=pipe)
response = llm("What is artificial intelligence?")
The model was trained on a curated dataset including:
Total Training Examples: ~50,000
Data Quality: High-quality, manually filtered and safety-checked
| Benchmark | Score | Description |
|---|---|---|
| MT-Bench | 7.2/10 | Multi-turn conversation quality |
| AlpacaEval | 78.5% | Win rate vs. text-davinci-003 |
| HumanEval | 42.3% | Python code generation (pass@1) |
| GSM8K | 35.7% | Math word problems |
| TruthfulQA | 51.2% | Truthfulness in answers |
| MMLU | 48.9% | Multi-task language understanding |
The model may exhibit biases present in the training data. We've implemented:
Users should:
@misc{helion-v1.5-2024,
author = {DeepXR},
title = {Helion-V1.5: Enhanced Conversational AI},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/DeepXR/Helion-V1.5}
}
Model Version: 1.5.0 | Release: December 2025