Helion-V1 Logo

Helion-V1.5

Helion-V1.5 is a 7B parameter conversational AI model fine-tuned from Llama-2 using QLoRA. It delivers improved performance over Helion-V1 with enhanced instruction following, code generation, and multi-turn dialogue capabilities.

Model Details

Architecture: Llama-2-7B with LoRA adapters
Parameters: 7 billion (base) + 67M (LoRA)
Context Length: 4096 tokens
Training: QLoRA (4-bit) fine-tuning on high-quality instruction data
License: Apache 2.0

Key Improvements over Helion-V1

Feature Helion-V1 Helion-V1.5 Improvement
MT-Bench Score 6.8 7.2 +5.9%
AlpacaEval Win Rate 72.3% 78.5% +8.6%
HumanEval Pass@1 38.1% 42.3% +11.0%
Avg Response Time 2.3s 1.8s -21.7%
Function Calling New
Streaming Support Basic Full Enhanced

Technical Specifications

Component Value
Hidden Size 4096
Layers 32
Attention Heads 32
Intermediate Size 11008
Vocabulary 32000 tokens
Position Encoding RoPE
Precision bfloat16

LoRA Configuration:

  • Rank: 64
  • Alpha: 128
  • Target Modules: All linear layers (q,k,v,o,gate,up,down)
  • Dropout: 0.05

Performance Benchmarks

Benchmark Score Category
MT-Bench 7.2/10 Multi-turn conversation
AlpacaEval 78.5% Instruction following
HumanEval 42.3% Code generation
GSM8K 35.7% Mathematical reasoning
TruthfulQA 51.2% Factual accuracy
MMLU 48.9% Knowledge

How to Use

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "DeepXR/Helion-V1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Prepare messages
messages = [
    {"role": "user", "content": "Explain machine learning in simple terms"}
]

# Apply chat template
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

# Generate response
output = model.generate(
    input_ids,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Using with Text Generation Inference (TGI)

docker run --gpus all --shm-size 1g -p 8080:80 \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id DeepXR/Helion-V1.5 \
  --max-input-length 3584 \
  --max-total-tokens 4096

Using with vLLM

from vllm import LLM, SamplingParams

llm = LLM(model="DeepXR/Helion-V1.5")
sampling_params = SamplingParams(temperature=0.7, top_p=0.9, max_tokens=512)

prompts = ["Explain quantum computing"]
outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    print(output.outputs[0].text)

Using with LangChain

from langchain.llms import HuggingFacePipeline
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="DeepXR/Helion-V1.5",
    max_new_tokens=512
)

llm = HuggingFacePipeline(pipeline=pipe)
response = llm("What is artificial intelligence?")

Training Data

Dataset Composition

The model was trained on a curated dataset including:

  • Conversational Data (40%): Multi-turn dialogues focusing on helpfulness
  • Instruction Following (30%): Task completion and instruction adherence
  • Safety Examples (15%): Refusal training for harmful requests
  • Domain-Specific (15%): Programming, writing, analysis tasks

Total Training Examples: ~50,000
Data Quality: High-quality, manually filtered and safety-checked

Data Processing

  • Deduplication using MinHash
  • Safety filtering for harmful content
  • Quality scoring and filtering (score > 0.7)
  • Format standardization to chat template
  • Context length trimming (max 4096 tokens)

Evaluation

Benchmark Results

Benchmark Score Description
MT-Bench 7.2/10 Multi-turn conversation quality
AlpacaEval 78.5% Win rate vs. text-davinci-003
HumanEval 42.3% Python code generation (pass@1)
GSM8K 35.7% Math word problems
TruthfulQA 51.2% Truthfulness in answers
MMLU 48.9% Multi-task language understanding

Capabilities

Advanced Features

  • Function Calling: Supports structured function/tool calling
  • Code Execution: Can generate and explain code across multiple languages
  • Multi-turn Context: Maintains conversation context up to 4096 tokens
  • Streaming Support: Compatible with streaming inference
  • Batch Processing: Efficient batch generation support
  • Custom System Prompts: Flexible system message configuration

Limitations

Known Limitations

  1. Knowledge Cutoff: Training data up to April 2023
  2. Hallucinations: May generate plausible but incorrect information
  3. Context Limitations: 4096 token context window
  4. Math Reasoning: Struggles with complex multi-step calculations
  5. Multilingual: Primarily English, limited other languages
  6. Temporal Reasoning: May not accurately understand time-sensitive queries
  7. Factual Accuracy: Not suitable as sole source of truth

Bias and Fairness

The model may exhibit biases present in the training data. We've implemented:

  • Bias evaluation across demographic groups
  • Regular fairness audits
  • User feedback integration
  • Ongoing bias mitigation efforts

Responsible Use

Users should:

  • Verify critical information from authoritative sources
  • Implement appropriate safeguards for production use
  • Monitor outputs for accuracy and appropriateness
  • Comply with applicable laws and regulations
  • Provide proper attribution for AI-generated content

Citation

@misc{helion-v1.5-2024,
  author = {DeepXR},
  title = {Helion-V1.5: Enhanced Conversational AI},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/DeepXR/Helion-V1.5}
}

Model Version: 1.5.0 | Release: December 2025

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for DeepXR/Helion-V1.5

Finetuned
(1098)
this model
Quantizations
1 model

Collection including DeepXR/Helion-V1.5

Evaluation results