You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Apertus-8B-2509-Encoder

Model Overview

Apertus-8B-2509-Encoder is an experimental bidirectional encoder model derived from the swiss-ai/Apertus-8B-2509 decoder-only model. This model represents the first attempt to create a native Apertus-based encoder for text embedding generation and semantic similarity tasks.

⚠️ Experimental Notice: This model is in experimental stage and may not perform optimally for production embedding tasks. See limitations section for details.

Model Details

Model Type: Bidirectional Transformer Encoder
Base Model: swiss-ai/Apertus-8B-2509
Parameters: 8.053 billion
Architecture: 32-layer transformer with XIELUActivation
Embedding Dimension: 4096
Supported Languages: 1811 (inherited from base model)
License: Apache 2.0

Intended Use

Primary Use Cases

Text embedding generation for research purposes
Cross-lingual semantic analysis experiments
Proof-of-concept for decoder-to-encoder conversion
Base model for further fine-tuning on embedding tasks

Downstream Tasks

Semantic similarity analysis
Information retrieval systems
Cross-lingual text comparison
Vector database integration

How to Use

from transformers import AutoModel, AutoTokenizer
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "speakdatawith/Apertus-8B-2509-Encoder",
    trust_remote_code=True
)
model = AutoModel.from_pretrained(
    "speakdatawith/Apertus-8B-2509-Encoder",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)

# Generate embeddings
def get_embeddings(texts):
    inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
        embeddings = outputs.last_hidden_state.mean(dim=1)
    return embeddings

# Example usage
texts = ["Hello world", "Hallo Welt", "Bonjour monde"]
embeddings = get_embeddings(texts)
print(f"Embeddings shape: {embeddings.shape}")

Model Architecture

The model maintains the original Apertus-8B-2509 architecture with key modifications:

Attention Mechanism: Converted from causal (decoder) to bidirectional (encoder)
Configuration Changes:
- is_decoder = False
- is_causal = False
- architectures = ['ApertusModel']
Pooling Strategy: Mean pooling over last hidden states

Training Details

Conversion Process

Loaded pre-trained swiss-ai/Apertus-8B-2509 model
Disabled causal masking in all attention layers
Updated model configuration for encoder usage
No additional training performed

Training Data

Inherits training data from the base model swiss-ai/Apertus-8B-2509. Refer to the base model documentation for detailed data information.

Performance & Limitations

Known Limitations

⚠️ Important Performance Notice:

Initial testing revealed suboptimal embedding quality
Semantic similarity scores appear inconsistent with expected behavior
Model may produce embeddings that do not accurately reflect semantic relationships
Performance significantly below specialized embedding models

Technical Limitations

Resource Requirements: 16GB+ GPU memory for inference
Speed: Significantly slower than specialized embedding models
Optimization: Not fine-tuned for embedding tasks
Pooling: Uses simple mean pooling strategy

Benchmark Results

Preliminary testing on basic similarity tasks showed:

Cross-lingual similarity detection: Inconsistent
Direct translation pairs: Below expected performance
Semantic relationship recognition: Requires improvement

System Requirements

Hardware

GPU: 16GB+ VRAM recommended (A100, H100, or equivalent)
CPU: High-memory alternative possible but significantly slower
RAM: 32GB+ system RAM recommended

Software

Python 3.12+
PyTorch 2.8.0+cu126
Transformers >= 4.56.1
trust_remote_code=True required

Ethical Considerations & Biases

Inherited Considerations

This model inherits all ethical considerations and potential biases from the base swiss-ai/Apertus-8B-2509 model. Users should:

Review base model documentation for bias analysis
Conduct appropriate bias testing for their specific use cases
Consider potential cultural and linguistic biases across 1811 supported languages

EU AI Act Compliance

This model is developed in compliance with EU AI Act requirements:

Comprehensive documentation provided
Risk assessment conducted
Transparency obligations fulfilled
Technical documentation available

Environmental Impact

Energy Consumption: High due to 8B parameter size
Carbon Footprint: Significant computational requirements
Efficiency: Substantially less efficient than specialized embedding models

Future Development

Potential improvements for future versions:

Fine-tuning on embedding-specific datasets
Implementation of advanced pooling strategies
Model distillation for efficiency improvements
Comprehensive evaluation on standard embedding benchmarks

Citation

@misc{apertus8b2509encoder,
  title={Apertus-8B-2509-Encoder: Experimental Bidirectional Encoder},
  author={speakdatawith},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/speakdatawith/Apertus-8B-2509-Encoder}
}

Acknowledgments

Base model: swiss-ai/Apertus-8B-2509
Architecture: Transformer-based encoder conversion
Framework: Hugging Face Transformers

Contact

For questions regarding this model or its implementation, please open an issue in the model repository.

Disclaimer: This is an experimental model. Production use is not recommended without thorough evaluation and potential fine-tuning for specific embedding tasks.

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for speakdatawith/Apertus-8B-2509-Encoder

Base model

swiss-ai/Apertus-8B-2509

Finetuned

(8)

this model