PLDR-LLM-v52-81M-FT-TC-1

Model Description

PLDR-LLM-v52-81M-FT-TC-1 is a finetuned PLDR-LLM (Large Language Model from Power Law Decoder Representations) with KV-cache and G-cache support for token classification. This model has a parameter size of 81M. It was finetuned using the CoNLL 2003 dataset on the PLDR-LLM base model PLDR-LLM-v52-110M-1.

More details about the PLDR-LLM architecture can be found in the research paper titled PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference.

Training data

PLDR-LLM-v52-81M-FT-TC-1 was finetuned using the CoNLL 2003 dataset which is a dataset intended for language independent named entity recognition. It consists of 14k samples for training, 3.3k samples for validation and 3.5k samples for testing. Base model was pretrained on the ~8B tokens from RefinedWeb, a publicly available English web dataset with extensive filtering and deduplication.

Training procedure

This model was trained with the custom model implementation of PLDR-LLM for the Huggingface Transformers library on a combination of train and validation splits and evaluated on test split for validation. Samples were preprocessed so that only one label is attributed per word. Following parameters were used for finetuning and other parameters were kept same as in the research paper detailing the PLDR-LLM architecture.

Parameter	Value
Learning rate	7x10^-5
Warm-up steps	20
Grad clip by norm	1.0
Epochs	2
Padding side	"right"
Add EOS token	True
min_lr_rate	0.01

Intended Use and Limitations

This model is intended to be used for research purposes. Given text as input prompt, it returns the predicted NER tag for people (B-PER, I-PER), locations (B-LOC, I-LOC), organizations (B-ORG, I-ORG) and miscellaneous (B-MISC, I-MISC) entities that do not belong to one of the aforementioned entity groups. B-TYPE indicates beginning of a new phrase and I-TYPE indicates inside of a phrase of type TYPE. The context length for this model is 1024 tokens.

How to Use

Via Huggingface Transformers Library

PLDR-LLM has custom model support for Huggingface Transformers library. PLDR-LLM custom models support was evaluated on Transformers 4.56.1 release available at the time.

from transformers import pipeline

token_classifier = pipeline(
    "token-classification", 
    model="fromthesky/PLDR-LLM-v52-81M-FT-TC-1", 
    aggregation_strategy="none",
    device="cuda", # or cpu
    trust_remote_code=True
)

text="Neil A. Armstrong was a NASA research pilot, astronaut, and first man to set foot on the Moon during the Apollo 11 mission."

output=token_classifier(text)

print("PREDICTION:")
for p in output:
    print(p)

PREDICTION:
{'entity': 'B-PER', 'score': np.float32(0.9817903), 'index': 0, 'word': '▁Neil', 'start': 0, 'end': 4}
{'entity': 'I-PER', 'score': np.float32(0.99994135), 'index': 1, 'word': '▁A', 'start': 4, 'end': 6}
{'entity': 'I-PER', 'score': np.float32(0.9999002), 'index': 3, 'word': '▁Armstrong', 'start': 7, 'end': 17}
{'entity': 'B-ORG', 'score': np.float32(0.965291), 'index': 6, 'word': '▁NASA', 'start': 23, 'end': 28}
{'entity': 'B-LOC', 'score': np.float32(0.9539427), 'index': 20, 'word': '▁Moon', 'start': 88, 'end': 93}
{'entity': 'B-MISC', 'score': np.float32(0.87971383), 'index': 23, 'word': '▁Apollo', 'start': 104, 'end': 111}
{'entity': 'I-MISC', 'score': np.float32(0.64193934), 'index': 24, 'word': '▁', 'start': 111, 'end': 112}
{'entity': 'I-MISC', 'score': np.float32(0.97851723), 'index': 25, 'word': '1', 'start': 112, 'end': 113}
{'entity': 'I-MISC', 'score': np.float32(0.9432906), 'index': 26, 'word': '1', 'start': 113, 'end': 114}

Notes:

This implementation of PLDR-LLM custom code was evaluated on Transformers 4.56.1 and pytorch 2.6.0.
text string in above example is from this source.

Limitations and Biases

This model was finetuned on a pretrained Large Language Model. Large Language Models may generate text that is profane, lewd, socially unacceptable or offensive based on the contents of the dataset it was pretrained. RefinedWeb is a dataset that is as toxic and biased as the Pile. Please see the papers for RefinedWeb and the Pile for more information. Moreover, large language models are also susceptible to hallucinations and may generate text that contains incorrect, irrelevant or misleading information. Since it is very hard to expect the contents of generated text ahead of time, the output of the large language models need to be heavily moderated and curated to avoid undesired content to appear without warning.

Eval results

Evaluation was done on test split which was used for validation.

Metric	Value
Accuracy	0.9582
Precision	0.7211
Recall	0.7564
F1	0.7383

BibTeX entry and citation info

@misc{gokden2025pldrllmkvgcache,
      title={PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference}, 
      author={Burc Gokden},
      year={2025},
      eprint={2502.13502},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.13502}, 
}

@misc{gokden2024pldrllm,
      title={PLDR-LLM: Large Language Model from Power Law Decoder Representations}, 
      author={Burc Gokden},
      year={2024},
      eprint={2410.16703},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.16703}, 
}