FineCat-NLI Large


Overview

This model is a fine-tune of the excellent tasksource/ModernBERT-large-nli, trained on the dleemiller/FineCat-NLI dataset—a compilation of several high-quality NLI data sources with quality screening and reduction of easy samples in the training split. The training also incorporates logit distillation from MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli, a top-performing NLI model, particularly on ANLI benchmarks.

Distillation loss looks like this: L=αLCE(z(s),y)+βLMSE(z(s),z(t)) \begin{equation} \mathcal{L} = \alpha \cdot \mathcal{L}_{\text{CE}}(z^{(s)}, y) + \beta \cdot \mathcal{L}_{\text{MSE}}(z^{(s)}, z^{(t)}) \end{equation}

where z(s)z^{(s)} and z(t)z^{(t)} are the student and teacher logits, yy are the ground truth labels, and α\alpha and β\beta are equally weighted at 0.5.

By combining the broad NLI training of the tasksource model (which excels on traditional MNLI/SNLI benchmarks) with the ANLI strengths of the MoritzLaurer model and the high-quality FineCat-NLI dataset, this model achieves strong performance across major NLI benchmarks while retaining the efficiency advantages of the ModernBERT architecture.

This model and dataset specifically targets improving NLI, through high quality sources. The tasksource models are the best checkpoints to start from, although training from ModernBERT is also competitive.


NLI Evaluation Results

F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=32 using a Nvidia Blackwell PRO 6000 Max-Q.

Model finecat mnli mnli_mismatched snli anli_r1 anli_r2 anli_r3 wanli lingnli Throughput (samples/s) Peak GPU Mem (MB)
MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli 0.8233 0.9121 0.9079 0.8898 0.7960 0.6830 0.6400 0.7700 0.8821 454.96 3250.44
dleemiller/finecat-nli-l 0.8227 0.9152 0.9265 0.9162 0.7480 0.5700 0.5433 0.7706 0.8742 539.04 1838.06
tasksource/ModernBERT-large-nli 0.7959 0.8983 0.9229 0.9188 0.7260 0.5110 0.4925 0.6978 0.8504 543.44 1838.06
dleemiller/ModernCE-large-nli 0.7811 0.9088 0.9205 0.9273 0.6630 0.4860 0.4408 0.6576 0.8566 540.74 1838.06
cross-encoder/nli-deberta-v3-large 0.7618 0.9019 0.9049 0.9220 0.5300 0.4170 0.3758 0.6548 0.8466 448.35 3250.44

Usage

Label Map:

  • entailment: 0
  • neutral: 1
  • contradiction: 2

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load the model and run inference.

from sentence_transformers import CrossEncoder
import numpy as np

model = CrossEncoder("dleemiller/finecat-nli-l")
id2label = model.model.config.id2label  # {0:'entailment', 1:'neutral', 2:'contradiction'}

pairs = [
    ("The glass fell off the counter and shattered on the tile.",
     "The glass broke when it hit the floor."),          # E
    ("The store opens at 9 a.m. every day.",
     "The store opens at 7 a.m. on weekdays."),          # C
    ("A researcher presented results at the conference.",
     "The presentation won the best paper award."),       # N
    ("It started raining heavily, so the match was postponed.",
     "The game was delayed due to weather."),             # E
    ("Every seat on the flight was taken.",
     "There were several empty seats on the plane."),     # C
]

logits = model.predict(pairs)  # shape: (5, 3)

for (prem, hyp), row in zip(pairs, logits):
    pred_idx = int(np.argmax(row))
    pred = id2label[pred_idx]
    print(f"[{pred}]  Premise: {prem}  |  Hypothesis: {hyp}")

Acknowledgments

We thank the creators and contributors of tasksource and MoritzLaurer for making their work available. This model would not be possible without their efforts and open source contributions.

Citation

@misc{nli-compiled-2025,
  title = {FineCat NLI Dataset},
  author = {Lee Miller},
  year = {2025},
  howpublished = {Refined compilation of 6 major NLI datasets}
}
Downloads last month
252
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dleemiller/finecat-nli-l

Finetuned
(217)
this model
Finetunes
1 model

Dataset used to train dleemiller/finecat-nli-l

Collection including dleemiller/finecat-nli-l

Evaluation results