You need to agree to use this model only for research or education purposes under Reactive AI Model & Architecture License (RAML) v1.0

The repository will be available instantly after accepting license terms

Accept Reactive AI Model & Architecture License (RAML) v1.0 terms to access the repository and use model. Reactive Transformer (pending patent #P.453260) is available for free for non-commercial usage. For commercial usage please contact Reactive AI at [email protected]

Log in or Sign Up to review the conditions and access this model content.

RxT-Beta Decoder Base (2.85B A190M)

Training & docs in progress

Progress ~35B/250B tokens

Decoder architecture

  • layers: 25 (21 stateful MoE + 3 stateless MoE + 1 stateless dense)
  • dim: 512
  • self-attention: Gated Sparse Query Attention (SQA) 8/16 query heads & 4/16 key/value heads
  • memory cross-attention: Sparse Query Attention (SQA) 8/16 query heads & 4/16 key/value heads
  • feed forward: Sparse Mixture-of-Experts (MoE) with gated shared experts
    • routed experts: 384
    • active experts: 10
    • routed expert dim: 192
    • shared experts: 2 with softmax gating
    • shared expert dim: 384
    • activation: SwiGLU
  • dense layer: 1536 dim with SwiGLU activation
  • vocab: 65k (english + polish)
  • params: 2.85B with 190M activated per token
Downloads last month
93
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ReactiveAI/RxT-Beta-Decoder-Base

Collection including ReactiveAI/RxT-Beta-Decoder-Base