language:
- en
tags:
- sentence-transformers
- cross-encoder
- reranker
- generated_from_trainer
- dataset_size:1024986
- loss:CrossEntropyLoss
- modernbert
- mnli
- snli
- anli
base_model: jhu-clsp/ettin-encoder-68m
datasets:
- dleemiller/FineCat-NLI
pipeline_tag: text-classification
library_name: sentence-transformers
metrics:
- f1_macro
- f1_micro
- f1_weighted
model-index:
- name: CrossEncoder based on jhu-clsp/ettin-encoder-68m
results:
- task:
type: cross-encoder-classification
name: Cross Encoder Classification
dataset:
name: FineCat dev
type: FineCat-dev
metrics:
- type: f1_macro
value: 0.8213
name: F1 Macro
- type: f1_micro
value: 0.8229
name: F1 Micro
- type: f1_weighted
value: 0.8226
name: F1 Weighted
FineCat-NLI Small
Overview
This model is a fine-tune of jhu-clsp/ettin-encoder-68m,
trained on the dleemiller/FineCat-NLI dataset—a compilation of several high-quality
NLI data sources with quality screening and reduction of easy samples in the training split.
The training also incorporates logit distillation from dleemiller/finecat-nli-l.
Distillation loss looks like this:
where and are the student and teacher logits, are the ground truth labels, and and are equally weighted at 0.5.
This model and dataset specifically targets improving NLI, through high quality sources. The tasksource models are the best checkpoints to start from, although training from ModernBERT is also competitive.
NLI Evaluation Results
F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=32 using a Nvidia Blackwell PRO 6000 Max-Q.
| Model | finecat | mnli | mnli_mismatched | snli | anli_r1 | anli_r2 | anli_r3 | wanli | lingnli | Throughput (samples/s) | Peak GPU Mem (MB) |
|---|---|---|---|---|---|---|---|---|---|---|---|
dleemiller/finecat-nli-s |
0.7834 | 0.8725 | 0.8725 | 0.8973 | 0.6400 | 0.4660 | 0.4617 | 0.7284 | 0.8072 | 2291.87 | 415.65 |
tasksource/deberta-small-long-nli |
0.7492 | 0.8194 | 0.8206 | 0.8613 | 0.5670 | 0.4220 | 0.4475 | 0.7034 | 0.7605 | 2250.66 | 1351.08 |
cross-encoder/nli-deberta-v3-xsmall |
0.7269 | 0.8781 | 0.8777 | 0.9164 | 0.3620 | 0.3030 | 0.3183 | 0.6096 | 0.8122 | 2510.05 | 753.91 |
dleemiller/EttinX-nli-s |
0.7251 | 0.8765 | 0.8798 | 0.9128 | 0.3360 | 0.2790 | 0.3083 | 0.6234 | 0.8012 | 2348.21 | 415.65 |
cross-encoder/nli-MiniLM2-L6-H768 |
0.7119 | 0.8660 | 0.8683 | 0.9137 | 0.3090 | 0.2850 | 0.2867 | 0.5830 | 0.7905 | 2885.72 | 566.64 |
cross-encoder/nli-distilroberta-base |
0.6936 | 0.8365 | 0.8398 | 0.8996 | 0.2660 | 0.2810 | 0.2975 | 0.5516 | 0.7516 | 2838.17 | 566.64 |
Usage
Label Map:
entailment: 0neutral: 1contradiction: 2
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load the model and run inference.
from sentence_transformers import CrossEncoder
import numpy as np
model = CrossEncoder("dleemiller/finecat-nli-s")
id2label = model.model.config.id2label # {0:'entailment', 1:'neutral', 2:'contradiction'}
pairs = [
("The glass fell off the counter and shattered on the tile.",
"The glass broke when it hit the floor."), # E
("The store opens at 9 a.m. every day.",
"The store opens at 7 a.m. on weekdays."), # C
("A researcher presented results at the conference.",
"The presentation won the best paper award."), # N
("It started raining heavily, so the match was postponed.",
"The game was delayed due to weather."), # E
("Every seat on the flight was taken.",
"There were several empty seats on the plane."), # C
]
logits = model.predict(pairs) # shape: (5, 3)
for (prem, hyp), row in zip(pairs, logits):
pred_idx = int(np.argmax(row))
pred = id2label[pred_idx]
print(f"[{pred}] Premise: {prem} | Hypothesis: {hyp}")
Acknowledgments
We thank the creators and contributors of tasksource and MoritzLaurer for making their work available.
This model would not be possible without their efforts and open source contributions.
Citation
@misc{nli-compiled-2025,
title = {FineCat NLI Dataset},
author = {Lee Miller},
year = {2025},
howpublished = {Refined compilation of 6 major NLI datasets}
}