|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
tags: |
|
|
- parallel-decoding |
|
|
- speculative-decoding |
|
|
- transformers |
|
|
- research |
|
|
- arxiv |
|
|
base_model: openai/gpt-oss-20b |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
paper: |
|
|
title: "Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning" |
|
|
url: https://arxiv.org/abs/2512.10054 |
|
|
--- |
|
|
|
|
|
# Parallel Decoder Transformer (PDT) adapters for GPT-OSS-20B |
|
|
|
|
|
This repository contains **PDT adapter/head weights** trained against the GPT-OSS-20B trunk, plus minimal training artifacts. |
|
|
|
|
|
**Paper:** [Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning](https://arxiv.org/abs/2512.10054) |
|
|
|
|
|
|
|
|
## Abstract (arXiv) |
|
|
|
|
|
Autoregressive decoding in Large Language Models (LLMs) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While "Decomposition-and-Fill" methods like Skeleton-of-Thought attempt to parallelize generation via external orchestration, they suffer from coherence drift due to the lack of cross-stream communication. In this work, we introduce the Parallel Decoder Transformer (PDT), a parameter-efficient architecture that embeds coordination primitives directly into the inference process of a frozen pre-trained model. Instead of retraining the base model, PDT injects lightweight Speculative Note Conditioning (SNC) adapters that allow parallel decoding streams to synchronize via a shared, dynamic latent space. We formulate coordination as a speculative consensus problem, where sibling streams broadcast semantic "notes" to a global bus, gated by a learned verification head. We validate our approach on a 50,000-step curriculum using a frozen 20B-parameter backbone. Our results demonstrate that PDT achieves effective self-correction, reaching 77.8% precision in coverage prediction and recovering approximate serial semantics without modifying the trunk weights. This establishes PDT as a scalable, efficient alternative to full model fine-tuning for structured parallel generation. |
|
|
|
|
|
|
|
|
|
|
|
## Example: PDT notes artifact (truncated) |
|
|
|
|
|
This is a real sample from the dataset pipeline (`survey_200141_ff0a0b4f.json`), shown with list/string truncation to keep the model card readable. |
|
|
|
|
|
```json |
|
|
{ |
|
|
"sample_id": "survey_200141_ff0a0b4f", |
|
|
"domain": "survey", |
|
|
"plan_path": "outputs/structured_plans/pdt_10k/survey/survey_200141_ff0a0b4f.json", |
|
|
"sectional_independence": true, |
|
|
"lag_delta": 1, |
|
|
"note_cadence_M": 6, |
|
|
"true_notes_example": { |
|
|
"stream_id": "stream_1", |
|
|
"ENT": [ |
|
|
{ |
|
|
"id": "E1", |
|
|
"name": "Croatan", |
|
|
"aliases": [ |
|
|
"Croatoan" |
|
|
], |
|
|
"type": "Ethnic Group", |
|
|
"canonical": true |
|
|
}, |
|
|
{ |
|
|
"id": "E2", |
|
|
"name": "Dare County", |
|
|
"aliases": [ |
|
|
"Alligator River", |
|
|
"Croatan Sound", |
|
|
"Roanoke Island", |
|
|
"... <2 more items>" |
|
|
], |
|
|
"type": "Location", |
|
|
"canonical": true |
|
|
}, |
|
|
{ |
|
|
"id": "E3", |
|
|
"name": "werowances", |
|
|
"aliases": [ |
|
|
"chiefs" |
|
|
], |
|
|
"type": "Leadership Title", |
|
|
"canonical": true |
|
|
}, |
|
|
"... <4 more items>" |
|
|
], |
|
|
"FACT": [ |
|
|
{ |
|
|
"subj_id": "E1", |
|
|
"predicate": "lived in", |
|
|
"object": "coastal areas of what is now North Carolina", |
|
|
"evidence_span": { |
|
|
"start": 45, |
|
|
"end": 87, |
|
|
"text": "coastal areas of what is now North Carolina" |
|
|
}, |
|
|
"certainty": 1.0 |
|
|
}, |
|
|
{ |
|
|
"subj_id": "E1", |
|
|
"predicate": "might have been", |
|
|
"object": "a branch of the larger Roanoke people or allied with them", |
|
|
"evidence_span": { |
|
|
"start": 92, |
|
|
"end": 141, |
|
|
"text": "a branch of the larger Roanoke people or allied with them" |
|
|
}, |
|
|
"certainty": 0.8 |
|
|
}, |
|
|
{ |
|
|
"subj_id": "E2", |
|
|
"predicate": "encompasses", |
|
|
"object": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks", |
|
|
"evidence_span": { |
|
|
"start": 177, |
|
|
"end": 265, |
|
|
"text": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks" |
|
|
}, |
|
|
"certainty": 1.0 |
|
|
}, |
|
|
"... <5 more items>" |
|
|
], |
|
|
"COVERAGE": [ |
|
|
{ |
|
|
"plan_item_id": "Define who the Croatan were, where they lived historically, and where related people live today.", |
|
|
"status": "missing" |
|
|
}, |
|
|
{ |
|
|
"plan_item_id": "Describe political leadership (werowances) and their responsibilities regarding wealth and decision-making.", |
|
|
"status": "missing" |
|
|
}, |
|
|
{ |
|
|
"plan_item_id": "Summarize core religious beliefs about a chief god, petty gods, immortality of the soul, heaven/Popogusso, and roles of priests and conjurors.", |
|
|
"status": "missing" |
|
|
}, |
|
|
"... <6 more items>" |
|
|
] |
|
|
}, |
|
|
"speculative_variant_example": { |
|
|
"variant_id": "survey_200141_ff0a0b4f_variant_0", |
|
|
"noise_config": { |
|
|
"paraphrase_ratio": 0.15, |
|
|
"drop_ratio": 0.05, |
|
|
"hallucination_ratio": 0.05, |
|
|
"shuffle_notes": true |
|
|
}, |
|
|
"lag_delta": 1, |
|
|
"notes_example": { |
|
|
"stream_id": "stream_1", |
|
|
"ENT": [ |
|
|
{ |
|
|
"id": "E1", |
|
|
"name": "Croatan", |
|
|
"aliases": [ |
|
|
"Croatoan", |
|
|
"Croatian" |
|
|
], |
|
|
"type": "Ethnic Group", |
|
|
"canonical": true |
|
|
}, |
|
|
{ |
|
|
"id": "E2", |
|
|
"name": "Dare County", |
|
|
"aliases": [ |
|
|
"Alligator River", |
|
|
"Croatan Sound", |
|
|
"Roanoke Island", |
|
|
"... <1 more items>" |
|
|
], |
|
|
"type": "Location", |
|
|
"canonical": true |
|
|
}, |
|
|
{ |
|
|
"id": "E3", |
|
|
"name": "werowances", |
|
|
"aliases": [ |
|
|
"chiefs", |
|
|
"leaders" |
|
|
], |
|
|
"type": "Leadership Title", |
|
|
"canonical": true |
|
|
}, |
|
|
"... <1 more items>" |
|
|
], |
|
|
"FACT": [ |
|
|
{ |
|
|
"subj_id": "E1", |
|
|
"predicate": "lived in", |
|
|
"object": "coastal areas of what is now North Carolina", |
|
|
"evidence_span": { |
|
|
"start": 45, |
|
|
"end": 87, |
|
|
"text": "coastal areas of what is now North Carolina" |
|
|
}, |
|
|
"certainty": 1.0 |
|
|
}, |
|
|
{ |
|
|
"subj_id": "E1", |
|
|
"predicate": "might have been", |
|
|
"object": "a branch of the larger Roanoke people or allied with them", |
|
|
"evidence_span": { |
|
|
"start": 92, |
|
|
"end": 141, |
|
|
"text": "a branch of the larger Roanoke people or allied with them" |
|
|
}, |
|
|
"certainty": 0.8 |
|
|
}, |
|
|
{ |
|
|
"subj_id": "E2", |
|
|
"predicate": "encompasses", |
|
|
"object": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks", |
|
|
"evidence_span": { |
|
|
"start": 177, |
|
|
"end": 265, |
|
|
"text": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks" |
|
|
}, |
|
|
"certainty": 1.0 |
|
|
}, |
|
|
"... <1 more items>" |
|
|
], |
|
|
"COVERAGE": [ |
|
|
{ |
|
|
"plan_item_id": "Define who the Croatan were, where they lived historically, and where related people live today.", |
|
|
"status": "missing" |
|
|
}, |
|
|
{ |
|
|
"plan_item_id": "Describe political leadership (werowances) and their responsibilities regarding wealth and decision-making.", |
|
|
"status": "missing" |
|
|
}, |
|
|
{ |
|
|
"plan_item_id": "Summarize core religious beliefs about a chief god, petty gods, immortality of the soul, heaven/Popogusso, and roles of priests and conjurors.", |
|
|
"status": "missing" |
|
|
}, |
|
|
"... <1 more items>" |
|
|
] |
|
|
} |
|
|
}, |
|
|
"versioned_notes_snapshot_0": { |
|
|
"snapshot_id": 0, |
|
|
"source": "procedural_bus", |
|
|
"lag_delta": 1, |
|
|
"note_cadence_M": 6, |
|
|
"ent_count": 9, |
|
|
"fact_count": 10, |
|
|
"notes_example": { |
|
|
"stream_id": "stream_1", |
|
|
"ENT": [ |
|
|
{ |
|
|
"id": "E1", |
|
|
"name": "Croatan", |
|
|
"aliases": [ |
|
|
"Croatoan" |
|
|
], |
|
|
"type": "Ethnic Group", |
|
|
"canonical": true |
|
|
}, |
|
|
{ |
|
|
"id": "E2", |
|
|
"name": "Dare County", |
|
|
"aliases": [ |
|
|
"Alligator River", |
|
|
"Croatan Sound", |
|
|
"Roanoke Island", |
|
|
"... <1 more items>" |
|
|
], |
|
|
"type": "Location", |
|
|
"canonical": true |
|
|
}, |
|
|
{ |
|
|
"id": "E3", |
|
|
"name": "werowances", |
|
|
"aliases": [ |
|
|
"chiefs" |
|
|
], |
|
|
"type": "Leadership Title", |
|
|
"canonical": true |
|
|
}, |
|
|
"... <1 more items>" |
|
|
], |
|
|
"FACT": [ |
|
|
{ |
|
|
"subj_id": "E1", |
|
|
"predicate": "lived in", |
|
|
"object": "coastal areas of what is now North Carolina", |
|
|
"evidence_span": { |
|
|
"start": 45, |
|
|
"end": 87, |
|
|
"text": "coastal areas of what is now North Carolina" |
|
|
}, |
|
|
"certainty": 1.0 |
|
|
}, |
|
|
{ |
|
|
"subj_id": "E1", |
|
|
"predicate": "might have been", |
|
|
"object": "a branch of the larger Roanoke people or allied with them", |
|
|
"evidence_span": { |
|
|
"start": 92, |
|
|
"end": 141, |
|
|
"text": "a branch of the larger Roanoke people or allied with them" |
|
|
}, |
|
|
"certainty": 0.8 |
|
|
}, |
|
|
{ |
|
|
"subj_id": "E2", |
|
|
"predicate": "encompasses", |
|
|
"object": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks", |
|
|
"evidence_span": { |
|
|
"start": 177, |
|
|
"end": 265, |
|
|
"text": "the Alligator River, Croatan Sound, Roanoke Island, Ocracoke Island, and parts of the Outer Banks" |
|
|
}, |
|
|
"certainty": 1.0 |
|
|
}, |
|
|
"... <1 more items>" |
|
|
], |
|
|
"COVERAGE": [ |
|
|
{ |
|
|
"plan_item_id": "Define who the Croatan were, where they lived historically, and where related people live today.", |
|
|
"status": "missing" |
|
|
}, |
|
|
{ |
|
|
"plan_item_id": "Describe political leadership (werowances) and their responsibilities regarding wealth and decision-making.", |
|
|
"status": "missing" |
|
|
}, |
|
|
{ |
|
|
"plan_item_id": "Summarize core religious beliefs about a chief god, petty gods, immortality of the soul, heaven/Popogusso, and roles of priests and conjurors.", |
|
|
"status": "missing" |
|
|
}, |
|
|
"... <1 more items>" |
|
|
] |
|
|
} |
|
|
}, |
|
|
"rollback": { |
|
|
"triggered": false, |
|
|
"l_tokens": 0, |
|
|
"events": [] |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
To reproduce this view locally: |
|
|
|
|
|
```bash |
|
|
uv run python scripts/pretty_notes_artifact.py survey_200141_ff0a0b4f.json |
|
|
``` |
|
|
|
|
|
|
|
|
## How to use |
|
|
|
|
|
1. Install the reference implementation (runtime + scripts): |
|
|
- `https://github.com/logan-robbins/parallel-decoder-transformer` |
|
|
2. Download the base trunk model (`openai/gpt-oss-20b`) via Hugging Face (or provide a local path). |
|
|
3. Download the adapter checkpoint from this repo and point `configs/gpt_oss_transfer_production.yaml` (or CLI flags) at it. |
|
|
|
|
|
## Artifacts (public GCS) |
|
|
|
|
|
The complete training artifacts and dataset archives are mirrored publicly in GCS: |
|
|
|
|
|
- **Bucket root:** `https://storage.googleapis.com/parallel-decoder-transformer/` |
|
|
- **Upload manifest (full listing):** `https://storage.googleapis.com/parallel-decoder-transformer/UPLOAD_MANIFEST.md` |
|
|
- **Training checkpoints:** `https://storage.googleapis.com/parallel-decoder-transformer/checkpoints/gpt-oss-8xH100-50000steps/` |
|
|
- **Dataset archives:** `https://storage.googleapis.com/parallel-decoder-transformer/data/archives/` |
|
|
|
|
|
## Training logs (Weights & Biases) |
|
|
|
|
|
- **WandB run:** `https://wandb.ai/ljrweb-self/parallel-decoder-transformer/runs/fmuea63a` |
|
|
|
|
|
## Why the dataset is structured this way |
|
|
|
|
|
PDT is trained on **streamed, structured supervision** produced by a 5-stage pipeline: |
|
|
|
|
|
- **Stage 2 (Plans):** a 3-stream decomposition plan is generated for each document. |
|
|
- **Stage 3 (Notes):** we generate **true notes (teacher)** and **speculative notes (student input)** in a consistent schema: |
|
|
- `ENT`: entity table (stable ids) |
|
|
- `FACT`: grounded tuples with `evidence_span` |
|
|
- `COVERAGE`: plan-item status targets (`covered|partial|missing`) |
|
|
- `versioned_notes`: lagged, versioned snapshots mirroring the Dynamic Notes Bus semantics |
|
|
- **Stage 5 (KD Export):** these artifacts are converted into `kd_*.jsonl` where each line is a **stream-level** training example. |
|
|
|
|
|
This layout is required to support the **teacher→student curriculum** described in the training guide: |
|
|
|
|
|
- **Stage 0:** planner/notes-head bootstrap (trunk frozen) |
|
|
- **Stage 1:** stream adapters + SNC cross-attention bootstrap (speculation frozen; teacher notes forced) |
|
|
- **Stage 2:** enable speculation + notes-bus usage (teacher-heavy mixing) |
|
|
- **Stage 3:** train agreement + coverage heads for self-correction/rollback behavior (still trunk frozen) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{robbins2025pdt, |
|
|
title={Parallel Decoder Transformer: Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning}, |
|
|
author={Robbins, Logan}, |
|
|
year={2025}, |
|
|
eprint={2512.10054}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.AI}, |
|
|
url={https://arxiv.org/abs/2512.10054} |
|
|
} |
|
|
``` |
|
|
|
|
|
## What’s included |
|
|
|
|
|
- `pdt_adapters.*`: trainable adapter/head weights (no trunk weights unless you intentionally uploaded them) |
|
|
- `training_report.json`, `train_run_stages.json`, `train_manifest.json`, `agreement_thresholds.json` |
|
|
|
|
|
## License |
|
|
|
|
|
- **This repo (adapters + artifacts)**: MIT. |
|
|
- **Base model**: `openai/gpt-oss-20b` is licensed under Apache-2.0 on Hugging Face (also see its `USAGE_POLICY` there). |
|
|
- **Reference implementation**: MIT at `https://github.com/logan-robbins/parallel-decoder-transformer`. |
|
|
|