You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Apertus-8B-2509-Encoder

Model Overview

Apertus-8B-2509-Encoder is an experimental bidirectional encoder model derived from the swiss-ai/Apertus-8B-2509 decoder-only model. This model represents the first attempt to create a native Apertus-based encoder for text embedding generation and semantic similarity tasks.

โš ๏ธ Experimental Notice: This model is in experimental stage and may not perform optimally for production embedding tasks. See limitations section for details.

Model Details

  • Model Type: Bidirectional Transformer Encoder
  • Base Model: swiss-ai/Apertus-8B-2509
  • Parameters: 8.053 billion
  • Architecture: 32-layer transformer with XIELUActivation
  • Embedding Dimension: 4096
  • Supported Languages: 1811 (inherited from base model)
  • License: Apache 2.0

Intended Use

Primary Use Cases

  • Text embedding generation for research purposes
  • Cross-lingual semantic analysis experiments
  • Proof-of-concept for decoder-to-encoder conversion
  • Base model for further fine-tuning on embedding tasks

Downstream Tasks

  • Semantic similarity analysis
  • Information retrieval systems
  • Cross-lingual text comparison
  • Vector database integration

How to Use

from transformers import AutoModel, AutoTokenizer
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "speakdatawith/Apertus-8B-2509-Encoder",
    trust_remote_code=True
)
model = AutoModel.from_pretrained(
    "speakdatawith/Apertus-8B-2509-Encoder",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)

# Generate embeddings
def get_embeddings(texts):
    inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
        embeddings = outputs.last_hidden_state.mean(dim=1)
    return embeddings

# Example usage
texts = ["Hello world", "Hallo Welt", "Bonjour monde"]
embeddings = get_embeddings(texts)
print(f"Embeddings shape: {embeddings.shape}")

Model Architecture

The model maintains the original Apertus-8B-2509 architecture with key modifications:

  • Attention Mechanism: Converted from causal (decoder) to bidirectional (encoder)
  • Configuration Changes:
    • is_decoder = False
    • is_causal = False
    • architectures = ['ApertusModel']
  • Pooling Strategy: Mean pooling over last hidden states

Training Details

Conversion Process

  1. Loaded pre-trained swiss-ai/Apertus-8B-2509 model
  2. Disabled causal masking in all attention layers
  3. Updated model configuration for encoder usage
  4. No additional training performed

Training Data

Inherits training data from the base model swiss-ai/Apertus-8B-2509. Refer to the base model documentation for detailed data information.

Performance & Limitations

Known Limitations

โš ๏ธ Important Performance Notice:

  • Initial testing revealed suboptimal embedding quality
  • Semantic similarity scores appear inconsistent with expected behavior
  • Model may produce embeddings that do not accurately reflect semantic relationships
  • Performance significantly below specialized embedding models

Technical Limitations

  • Resource Requirements: 16GB+ GPU memory for inference
  • Speed: Significantly slower than specialized embedding models
  • Optimization: Not fine-tuned for embedding tasks
  • Pooling: Uses simple mean pooling strategy

Benchmark Results

Preliminary testing on basic similarity tasks showed:

  • Cross-lingual similarity detection: Inconsistent
  • Direct translation pairs: Below expected performance
  • Semantic relationship recognition: Requires improvement

System Requirements

Hardware

  • GPU: 16GB+ VRAM recommended (A100, H100, or equivalent)
  • CPU: High-memory alternative possible but significantly slower
  • RAM: 32GB+ system RAM recommended

Software

  • Python 3.12+
  • PyTorch 2.8.0+cu126
  • Transformers >= 4.56.1
  • trust_remote_code=True required

Ethical Considerations & Biases

Inherited Considerations

This model inherits all ethical considerations and potential biases from the base swiss-ai/Apertus-8B-2509 model. Users should:

  • Review base model documentation for bias analysis
  • Conduct appropriate bias testing for their specific use cases
  • Consider potential cultural and linguistic biases across 1811 supported languages

EU AI Act Compliance

This model is developed in compliance with EU AI Act requirements:

  • Comprehensive documentation provided
  • Risk assessment conducted
  • Transparency obligations fulfilled
  • Technical documentation available

Environmental Impact

  • Energy Consumption: High due to 8B parameter size
  • Carbon Footprint: Significant computational requirements
  • Efficiency: Substantially less efficient than specialized embedding models

Future Development

Potential improvements for future versions:

  • Fine-tuning on embedding-specific datasets
  • Implementation of advanced pooling strategies
  • Model distillation for efficiency improvements
  • Comprehensive evaluation on standard embedding benchmarks

Citation

@misc{apertus8b2509encoder,
  title={Apertus-8B-2509-Encoder: Experimental Bidirectional Encoder},
  author={speakdatawith},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/speakdatawith/Apertus-8B-2509-Encoder}
}

Acknowledgments

  • Base model: swiss-ai/Apertus-8B-2509
  • Architecture: Transformer-based encoder conversion
  • Framework: Hugging Face Transformers

Contact

For questions regarding this model or its implementation, please open an issue in the model repository.


Disclaimer: This is an experimental model. Production use is not recommended without thorough evaluation and potential fine-tuning for specific embedding tasks.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for speakdatawith/Apertus-8B-2509-Encoder

Finetuned
(8)
this model