artpods56 (Artur Podsiadły)

updated a dataset 16 days ago

artpods56/KUL_IDUB_EcclessiaSchematisms

Viewer • Updated 16 days ago • 16.3k • 295

reacted to danielhanchen's post with 🚀🔥 18 days ago

Post

2019

Mistral's new SOTA coding models Devstral 2 can now be Run locally! (25GB RAM) 🐱
We fixed the chat template, so performance should be much better now!
24B: unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
123B: unsloth/Devstral-2-123B-Instruct-2512-GGUF

🧡Step-by-step Guide: https://docs.unsloth.ai/models/devstral-2

upvoted an article about 2 months ago

Article

Reactive Transformer (RxT): Fixing the Memory Problem in Conversational AI

Oct 8

•

6

reacted to m-ric's post with 🚀 3 months ago

Post

4904

STOP EVERYTHING NOW - we might finally have a radical architecture improvement over Transformers!!! 🚨

A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year.

➡️ Tiny Recursive Model is 7M parameters
➡️ On ARC-AGI, it beats flagship models like Gemini-2.5-pro

Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger
and had 1,000 as many authors 😂 (Alexia is alone on the paper)

What's this sorcery?
In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables: one for the proposed answer and one for the reasoning.

@AlexiaJM started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M)

Hierarchical Reasoning Model had introduced one main feature:
🔎 Deep supervision
In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps.

They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers.

Alexia studied what was useful and what wasn't, and cleaned the architecture as follows :
Why use a recurrent architecture, when you can just make it a loop?
➡️ She made the network recursive, looping over itself

Why use 2 latent variables ?
➡️ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer.
➡️ She runs ablation studies to validate that 2 is indeed optimal.

This new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do.

This might be the breakthrough we've been awaiting for so long!

4 replies

·

reacted to AdamF92's post with 🔥 3 months ago

Post

2382

Hi, I just published research paper that's introducing my Reactive Transformer (RxT) architecture. I would be grateful if you could check it and upvote on HuggingFace Daily Papers - Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models (2510.03561)

Architecture is based on stateful real-time processing with innovational asynchronous memory update. Instead of reprocessing all the conversation history for each message, it's processing only single query with all the context moved to dedicated memory layers. Memory is updated after generating the answer, so it's not influencing latency - in tests, time to first token was almost the same as generating a single token. It has also better quality/accuracy in multi-turn dialogue than the same size stateless decoder-only model.

Initial experiments were small scale (12M to 160M params models trained on simple synthetic datasets), but just now I'm starting training of bigger 270M params model on real data

Collection: ReactiveAI/reactive-transformer-poc-rxt-alpha-supervised-models-68e4004a4a59366e01a7b86f
Profile:

ReactiveAI

upvoted a paper 3 months ago

Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Paper • 2510.03561 • Published Oct 3 • 24

reacted to AdamF92's post with 🔥 3 months ago

Post

2382

Hi, I just published research paper that's introducing my Reactive Transformer (RxT) architecture. I would be grateful if you could check it and upvote on HuggingFace Daily Papers - Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models (2510.03561)

Architecture is based on stateful real-time processing with innovational asynchronous memory update. Instead of reprocessing all the conversation history for each message, it's processing only single query with all the context moved to dedicated memory layers. Memory is updated after generating the answer, so it's not influencing latency - in tests, time to first token was almost the same as generating a single token. It has also better quality/accuracy in multi-turn dialogue than the same size stateless decoder-only model.

Initial experiments were small scale (12M to 160M params models trained on simple synthetic datasets), but just now I'm starting training of bigger 270M params model on real data

Collection: ReactiveAI/reactive-transformer-poc-rxt-alpha-supervised-models-68e4004a4a59366e01a7b86f
Profile:

ReactiveAI