You need to agree to use this model only for research or education purposes under Reactive AI Model & Architecture License (RAML) v1.0

The repository will be available instantly after accepting license terms

Accept Reactive AI Model & Architecture License (RAML) v1.0 terms to access the repository and use model. Reactive Transformer (pending patent #P.453260) is available for free for non-commercial usage. For commercial usage please contact Reactive AI at [email protected]

RxT-Beta Decoder Base (2.85B A190M)

Training & docs in progress

Progress ~35B/250B tokens

layers: 25 (21 stateful MoE + 3 stateless MoE + 1 stateless dense)
dim: 512
self-attention: Gated Sparse Query Attention (SQA) 8/16 query heads & 4/16 key/value heads
memory cross-attention: Sparse Query Attention (SQA) 8/16 query heads & 4/16 key/value heads
feed forward: Sparse Mixture-of-Experts (MoE) with gated shared experts
- routed experts: 384
- active experts: 10
- routed expert dim: 192
- shared experts: 2 with softmax gating
- shared expert dim: 384
- activation: SwiGLU
dense layer: 1536 dim with SwiGLU activation
vocab: 65k (english + polish)
params: 2.85B with 190M activated per token

Safetensors

Model size

3B params

Tensor type

BF16