loganrobbins's picture
Model card: fix GitHub URL (logan-robbins)
771b6cb verified
---
language:
- en
license: mit
tags:
- parallel-decoding
- speculative-decoding
- transformers
- research
- arxiv
base_model: openai/gpt-oss-20b
library_name: transformers
pipeline_tag: text-generation
paper:
title: "Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning"
url: https://arxiv.org/abs/2512.10054
---
# Parallel Decoder Transformer (PDT) adapters for GPT-OSS-20B
This repository contains **PDT adapter/head weights** trained against the GPT-OSS-20B trunk, plus minimal training artifacts.
**Paper:** [Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning](https://arxiv.org/abs/2512.10054)
## Abstract (arXiv)
Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While "Decomposition-and-Fill" methods like Skeleton-of-Thought attempt to parallelize generation via external orchestration, they suffer from coherence drift due to the lack of cross-stream communication. In this work, we introduce the Parallel Decoder Transformer (PDT), a parameter-efficient architecture that embeds coordination primitives directly into the inference process of a frozen pre-trained model. Instead of retraining the base model, PDT injects lightweight Speculative Note Conditioning (SNC) adapters that allow parallel decoding streams to synchronize via a shared, dynamic latent space. We formulate coordination as a speculative consensus problem, where sibling streams broadcast semantic "notes" to a global bus, gated by a learned verification head. We validate our approach on a 50,000-step curriculum using a frozen 20B-parameter backbone. Our results demonstrate that PDT achieves effective self-correction, reaching 77.8% precision in coverage prediction and recovering approximate serial semantics without modifying the trunk weights. This establishes PDT as a scalable, efficient alternative to full model fine-tuning for structured parallel generation.
## Example: PDT notes artifact (truncated)
This is a real sample from the dataset pipeline (`survey_200141_ff0a0b4f.json`), shown with list/string truncation to keep the model card readable.
```json
{
"sample_id": "survey_200141_ff0a0b4f",
"domain": "survey",
"plan_path": "outputs/structured_plans/pdt_10k/survey/survey_200141_ff0a0b4f.json",
"sectional_independence": true,
"lag_delta": 1,
"note_cadence_M": 6,
"true_notes_example": {
"stream_id": "stream_1",
"ENT": [
{
"id": "E1",
"name": "Croatan",
"aliases": [
"Croatoan"
],
"type": "Ethnic Group",
"canonical": true
},
{
"id": "E2",
"name": "Dare County",
"aliases": [
"Alligator River",
"Croatan Sound",
"Roanoke Island",
"... <2 more items>"
],
"type": "Location",
"canonical": true
},
{
"id": "E3",
"name": "werowances",
"aliases": [
"chiefs"
],
"type": "Leadership Title",
"canonical": true
},
"... <4 more items>"
],
"FACT": [
{
"subj_id": "E1",
"predicate": "lived in",
"object": "coastal areas of what is now North Carolina",
"evidence_span": {
"start": 45,
"end": 87,
"text": "coastal areas of what is now North Carolina"
},
"certainty": 1.0
},
{
"subj_id": "E1",
"predicate": "might have been",
"object": "a branch of the larger Roanoke people or allied with them",
"evidence_span": {
"start": 92,
"end": 141,
"text": "a branch of the larger Roanoke people or allied with them"
},
"certainty": 0.8
},
{
"subj_id": "E2",
"predicate": "encompasses",
"object": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks",
"evidence_span": {
"start": 177,
"end": 265,
"text": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks"
},
"certainty": 1.0
},
"... <5 more items>"
],
"COVERAGE": [
{
"plan_item_id": "Define who the Croatan were, where they lived historically, and where related people live today.",
"status": "missing"
},
{
"plan_item_id": "Describe political leadership (werowances) and their responsibilities regarding wealth and decision-making.",
"status": "missing"
},
{
"plan_item_id": "Summarize core religious beliefs about a chief god, petty gods, immortality of the soul, heaven/Popogusso, and roles of priests and conjurors.",
"status": "missing"
},
"... <6 more items>"
]
},
"speculative_variant_example": {
"variant_id": "survey_200141_ff0a0b4f_variant_0",
"noise_config": {
"paraphrase_ratio": 0.15,
"drop_ratio": 0.05,
"hallucination_ratio": 0.05,
"shuffle_notes": true
},
"lag_delta": 1,
"notes_example": {
"stream_id": "stream_1",
"ENT": [
{
"id": "E1",
"name": "Croatan",
"aliases": [
"Croatoan",
"Croatian"
],
"type": "Ethnic Group",
"canonical": true
},
{
"id": "E2",
"name": "Dare County",
"aliases": [
"Alligator River",
"Croatan Sound",
"Roanoke Island",
"... <1 more items>"
],
"type": "Location",
"canonical": true
},
{
"id": "E3",
"name": "werowances",
"aliases": [
"chiefs",
"leaders"
],
"type": "Leadership Title",
"canonical": true
},
"... <1 more items>"
],
"FACT": [
{
"subj_id": "E1",
"predicate": "lived in",
"object": "coastal areas of what is now North Carolina",
"evidence_span": {
"start": 45,
"end": 87,
"text": "coastal areas of what is now North Carolina"
},
"certainty": 1.0
},
{
"subj_id": "E1",
"predicate": "might have been",
"object": "a branch of the larger Roanoke people or allied with them",
"evidence_span": {
"start": 92,
"end": 141,
"text": "a branch of the larger Roanoke people or allied with them"
},
"certainty": 0.8
},
{
"subj_id": "E2",
"predicate": "encompasses",
"object": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks",
"evidence_span": {
"start": 177,
"end": 265,
"text": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks"
},
"certainty": 1.0
},
"... <1 more items>"
],
"COVERAGE": [
{
"plan_item_id": "Define who the Croatan were, where they lived historically, and where related people live today.",
"status": "missing"
},
{
"plan_item_id": "Describe political leadership (werowances) and their responsibilities regarding wealth and decision-making.",
"status": "missing"
},
{
"plan_item_id": "Summarize core religious beliefs about a chief god, petty gods, immortality of the soul, heaven/Popogusso, and roles of priests and conjurors.",
"status": "missing"
},
"... <1 more items>"
]
}
},
"versioned_notes_snapshot_0": {
"snapshot_id": 0,
"source": "procedural_bus",
"lag_delta": 1,
"note_cadence_M": 6,
"ent_count": 9,
"fact_count": 10,
"notes_example": {
"stream_id": "stream_1",
"ENT": [
{
"id": "E1",
"name": "Croatan",
"aliases": [
"Croatoan"
],
"type": "Ethnic Group",
"canonical": true
},
{
"id": "E2",
"name": "Dare County",
"aliases": [
"Alligator River",
"Croatan Sound",
"Roanoke Island",
"... <1 more items>"
],
"type": "Location",
"canonical": true
},
{
"id": "E3",
"name": "werowances",
"aliases": [
"chiefs"
],
"type": "Leadership Title",
"canonical": true
},
"... <1 more items>"
],
"FACT": [
{
"subj_id": "E1",
"predicate": "lived in",
"object": "coastal areas of what is now North Carolina",
"evidence_span": {
"start": 45,
"end": 87,
"text": "coastal areas of what is now North Carolina"
},
"certainty": 1.0
},
{
"subj_id": "E1",
"predicate": "might have been",
"object": "a branch of the larger Roanoke people or allied with them",
"evidence_span": {
"start": 92,
"end": 141,
"text": "a branch of the larger Roanoke people or allied with them"
},
"certainty": 0.8
},
{
"subj_id": "E2",
"predicate": "encompasses",
"object": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks",
"evidence_span": {
"start": 177,
"end": 265,
"text": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks"
},
"certainty": 1.0
},
"... <1 more items>"
],
"COVERAGE": [
{
"plan_item_id": "Define who the Croatan were, where they lived historically, and where related people live today.",
"status": "missing"
},
{
"plan_item_id": "Describe political leadership (werowances) and their responsibilities regarding wealth and decision-making.",
"status": "missing"
},
{
"plan_item_id": "Summarize core religious beliefs about a chief god, petty gods, immortality of the soul, heaven/Popogusso, and roles of priests and conjurors.",
"status": "missing"
},
"... <1 more items>"
]
}
},
"rollback": {
"triggered": false,
"l_tokens": 0,
"events": []
}
}
```
To reproduce this view locally:
```bash
uv run python scripts/pretty_notes_artifact.py survey_200141_ff0a0b4f.json
```
## How to use
1. Install the reference implementation (runtime + scripts):
- `https://github.com/logan-robbins/parallel-decoder-transformer`
2. Download the base trunk model (`openai/gpt-oss-20b`) via Hugging Face (or provide a local path).
3. Download the adapter checkpoint from this repo and point `configs/gpt_oss_transfer_production.yaml` (or CLI flags) at it.
## Artifacts (public GCS)
The complete training artifacts and dataset archives are mirrored publicly in GCS:
- **Bucket root:** `https://storage.googleapis.com/parallel-decoder-transformer/`
- **Upload manifest (full listing):** `https://storage.googleapis.com/parallel-decoder-transformer/UPLOAD_MANIFEST.md`
- **Training checkpoints:** `https://storage.googleapis.com/parallel-decoder-transformer/checkpoints/gpt-oss-8xH100-50000steps/`
- **Dataset archives:** `https://storage.googleapis.com/parallel-decoder-transformer/data/archives/`
## Training logs (Weights & Biases)
- **WandB run:** `https://wandb.ai/ljrweb-self/parallel-decoder-transformer/runs/fmuea63a`
## Why the dataset is structured this way
PDT is trained on **streamed, structured supervision** produced by a 5-stage pipeline:
- **Stage 2 (Plans):** a 3-stream decomposition plan is generated for each document.
- **Stage 3 (Notes):** we generate **true notes (teacher)** and **speculative notes (student input)** in a consistent schema:
- `ENT`: entity table (stable ids)
- `FACT`: grounded tuples with `evidence_span`
- `COVERAGE`: plan-item status targets (`covered|partial|missing`)
- `versioned_notes`: lagged, versioned snapshots mirroring the Dynamic Notes Bus semantics
- **Stage 5 (KD Export):** these artifacts are converted into `kd_*.jsonl` where each line is a **stream-level** training example.
This layout is required to support the **teacher→student curriculum** described in the training guide:
- **Stage 0:** planner/notes-head bootstrap (trunk frozen)
- **Stage 1:** stream adapters + SNC cross-attention bootstrap (speculation frozen; teacher notes forced)
- **Stage 2:** enable speculation + notes-bus usage (teacher-heavy mixing)
- **Stage 3:** train agreement + coverage heads for self-correction/rollback behavior (still trunk frozen)
## Citation
```bibtex
@misc{robbins2025pdt,
title={Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning},
author={Robbins, Logan},
year={2025},
eprint={2512.10054},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.10054}
}
```
## What’s included
- `pdt_adapters.*`: trainable adapter/head weights (no trunk weights unless you intentionally uploaded them)
- `training_report.json`, `train_run_stages.json`, `train_manifest.json`, `agreement_thresholds.json`
## License
- **This repo (adapters + artifacts)**: MIT.
- **Base model**: `openai/gpt-oss-20b` is licensed under Apache-2.0 on Hugging Face (also see its `USAGE_POLICY` there).
- **Reference implementation**: MIT at `https://github.com/logan-robbins/parallel-decoder-transformer`.