Apertus-8B-2509-Encoder
Model Overview
Apertus-8B-2509-Encoder is an experimental bidirectional encoder model derived from the swiss-ai/Apertus-8B-2509 decoder-only model. This model represents the first attempt to create a native Apertus-based encoder for text embedding generation and semantic similarity tasks.
โ ๏ธ Experimental Notice: This model is in experimental stage and may not perform optimally for production embedding tasks. See limitations section for details.
Model Details
- Model Type: Bidirectional Transformer Encoder
- Base Model: swiss-ai/Apertus-8B-2509
- Parameters: 8.053 billion
- Architecture: 32-layer transformer with XIELUActivation
- Embedding Dimension: 4096
- Supported Languages: 1811 (inherited from base model)
- License: Apache 2.0
Intended Use
Primary Use Cases
- Text embedding generation for research purposes
- Cross-lingual semantic analysis experiments
- Proof-of-concept for decoder-to-encoder conversion
- Base model for further fine-tuning on embedding tasks
Downstream Tasks
- Semantic similarity analysis
- Information retrieval systems
- Cross-lingual text comparison
- Vector database integration
How to Use
from transformers import AutoModel, AutoTokenizer
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"speakdatawith/Apertus-8B-2509-Encoder",
trust_remote_code=True
)
model = AutoModel.from_pretrained(
"speakdatawith/Apertus-8B-2509-Encoder",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
# Generate embeddings
def get_embeddings(texts):
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
return embeddings
# Example usage
texts = ["Hello world", "Hallo Welt", "Bonjour monde"]
embeddings = get_embeddings(texts)
print(f"Embeddings shape: {embeddings.shape}")
Model Architecture
The model maintains the original Apertus-8B-2509 architecture with key modifications:
- Attention Mechanism: Converted from causal (decoder) to bidirectional (encoder)
- Configuration Changes:
is_decoder = Falseis_causal = Falsearchitectures = ['ApertusModel']
- Pooling Strategy: Mean pooling over last hidden states
Training Details
Conversion Process
- Loaded pre-trained swiss-ai/Apertus-8B-2509 model
- Disabled causal masking in all attention layers
- Updated model configuration for encoder usage
- No additional training performed
Training Data
Inherits training data from the base model swiss-ai/Apertus-8B-2509. Refer to the base model documentation for detailed data information.
Performance & Limitations
Known Limitations
โ ๏ธ Important Performance Notice:
- Initial testing revealed suboptimal embedding quality
- Semantic similarity scores appear inconsistent with expected behavior
- Model may produce embeddings that do not accurately reflect semantic relationships
- Performance significantly below specialized embedding models
Technical Limitations
- Resource Requirements: 16GB+ GPU memory for inference
- Speed: Significantly slower than specialized embedding models
- Optimization: Not fine-tuned for embedding tasks
- Pooling: Uses simple mean pooling strategy
Benchmark Results
Preliminary testing on basic similarity tasks showed:
- Cross-lingual similarity detection: Inconsistent
- Direct translation pairs: Below expected performance
- Semantic relationship recognition: Requires improvement
System Requirements
Hardware
- GPU: 16GB+ VRAM recommended (A100, H100, or equivalent)
- CPU: High-memory alternative possible but significantly slower
- RAM: 32GB+ system RAM recommended
Software
- Python 3.12+
- PyTorch 2.8.0+cu126
- Transformers >= 4.56.1
trust_remote_code=Truerequired
Ethical Considerations & Biases
Inherited Considerations
This model inherits all ethical considerations and potential biases from the base swiss-ai/Apertus-8B-2509 model. Users should:
- Review base model documentation for bias analysis
- Conduct appropriate bias testing for their specific use cases
- Consider potential cultural and linguistic biases across 1811 supported languages
EU AI Act Compliance
This model is developed in compliance with EU AI Act requirements:
- Comprehensive documentation provided
- Risk assessment conducted
- Transparency obligations fulfilled
- Technical documentation available
Environmental Impact
- Energy Consumption: High due to 8B parameter size
- Carbon Footprint: Significant computational requirements
- Efficiency: Substantially less efficient than specialized embedding models
Future Development
Potential improvements for future versions:
- Fine-tuning on embedding-specific datasets
- Implementation of advanced pooling strategies
- Model distillation for efficiency improvements
- Comprehensive evaluation on standard embedding benchmarks
Citation
@misc{apertus8b2509encoder,
title={Apertus-8B-2509-Encoder: Experimental Bidirectional Encoder},
author={speakdatawith},
year={2025},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/speakdatawith/Apertus-8B-2509-Encoder}
}
Acknowledgments
- Base model: swiss-ai/Apertus-8B-2509
- Architecture: Transformer-based encoder conversion
- Framework: Hugging Face Transformers
Contact
For questions regarding this model or its implementation, please open an issue in the model repository.
Disclaimer: This is an experimental model. Production use is not recommended without thorough evaluation and potential fine-tuning for specific embedding tasks.
- Downloads last month
- -
Model tree for speakdatawith/Apertus-8B-2509-Encoder
Base model
swiss-ai/Apertus-8B-2509