README.md · dleemiller/finecat-nli-s at main

finecat-nli-s / README.md

dleemiller

Update README.md

eea470a verified 8 days ago

preview code

raw

history blame contribute delete

4.95 kB

	---
	language:
	- en
	tags:
	- sentence-transformers
	- cross-encoder
	- reranker
	- generated_from_trainer
	- dataset_size:1024986
	- loss:CrossEntropyLoss
	- modernbert
	- mnli
	- snli
	- anli
	base_model: jhu-clsp/ettin-encoder-68m
	datasets:
	- dleemiller/FineCat-NLI
	pipeline_tag: text-classification
	library_name: sentence-transformers
	metrics:
	- f1_macro
	- f1_micro
	- f1_weighted
	model-index:
	- name: CrossEncoder based on jhu-clsp/ettin-encoder-68m
	results:
	- task:
	type: cross-encoder-classification
	name: Cross Encoder Classification
	dataset:
	name: FineCat dev
	type: FineCat-dev
	metrics:
	- type: f1_macro
	value: 0.8213
	name: F1 Macro
	- type: f1_micro
	value: 0.8229
	name: F1 Micro
	- type: f1_weighted
	value: 0.8226
	name: F1 Weighted
	---

	# FineCat-NLI Small

	<p align="center">
	<img src="/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F65ff92ea467d83751a727538%2FJzq_CZCyRYGrVgbto3eRr.png%26quot%3B%3C%2Fspan%3E style="width: 400px;">
	</p>

	-----

	# Overview

	This model is a fine-tune of `jhu-clsp/ettin-encoder-68m`,
	trained on the `dleemiller/FineCat-NLI` dataset—a compilation of several high-quality
	NLI data sources with quality screening and reduction of easy samples in the training split.
	The training also incorporates logit distillation from `dleemiller/finecat-nli-l`.

	Distillation loss looks like this:
	$$
	\begin{equation}
	\mathcal{L} = \alpha \cdot \mathcal{L}_{\text{CE}}(z^{(s)}, y) + \beta \cdot \mathcal{L}_{\text{MSE}}(z^{(s)}, z^{(t)})
	\end{equation}
	$$

	where \$z^{(s)}\$ and \$z^{(t)}\$ are the student and teacher logits, \$y\$ are the ground truth labels,
	and \$\alpha\$ and \$\beta\$ are equally weighted at 0.5.

	This model and dataset specifically targets improving NLI, through high quality sources. The tasksource models
	are the best checkpoints to start from, although training from ModernBERT is also competitive.

	-----

	# NLI Evaluation Results

	F1-Micro scores (equivalent to accuracy) for each dataset.
	Performance was measured at bs=32 using a Nvidia Blackwell PRO 6000 Max-Q.

	\| Model \| finecat \| mnli \| mnli_mismatched \| snli \| anli_r1 \| anli_r2 \| anli_r3 \| wanli \| lingnli \| Throughput (samples/s) \| Peak GPU Mem (MB) \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| `dleemiller/finecat-nli-s` \| 0.7834 \| 0.8725 \| 0.8725 \| 0.8973 \| 0.6400 \| 0.4660 \| 0.4617 \| 0.7284 \| <u>0.8072</u> \| 2291.87 \| 415.65 \|
	\| `tasksource/deberta-small-long-nli` \| 0.7492 \| 0.8194 \| 0.8206 \| 0.8613 \| <u>0.5670</u> \| <u>0.4220</u> \| <u>0.4475</u> \| <u>0.7034</u> \| 0.7605 \| 2250.66 \| 1351.08 \|
	\| `cross-encoder/nli-deberta-v3-xsmall` \| 0.7269 \| 0.8781 \| <u>0.8777</u> \| 0.9164 \| 0.3620 \| 0.3030 \| 0.3183 \| 0.6096 \| 0.8122 \| 2510.05 \| 753.91 \|
	\| `dleemiller/EttinX-nli-s` \| 0.7251 \| <u>0.8765</u> \| 0.8798 \| 0.9128 \| 0.3360 \| 0.2790 \| 0.3083 \| 0.6234 \| 0.8012 \| 2348.21 \| 415.65 \|
	\| `cross-encoder/nli-MiniLM2-L6-H768` \| 0.7119 \| 0.8660 \| 0.8683 \| <u>0.9137</u> \| 0.3090 \| 0.2850 \| 0.2867 \| 0.5830 \| 0.7905 \| 2885.72 \| 566.64 \|
	\| `cross-encoder/nli-distilroberta-base` \| 0.6936 \| 0.8365 \| 0.8398 \| 0.8996 \| 0.2660 \| 0.2810 \| 0.2975 \| 0.5516 \| 0.7516 \| 2838.17 \| 566.64 \|

	-----

	# Usage

	### Label Map:
	- `entailment`: 0
	- `neutral`: 1
	- `contradiction`: 2


	## Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load the model and run inference.

	```python
	from sentence_transformers import CrossEncoder
	import numpy as np

	model = CrossEncoder("dleemiller/finecat-nli-s")
	id2label = model.model.config.id2label # {0:'entailment', 1:'neutral', 2:'contradiction'}

	pairs = [
	("The glass fell off the counter and shattered on the tile.",
	"The glass broke when it hit the floor."), # E
	("The store opens at 9 a.m. every day.",
	"The store opens at 7 a.m. on weekdays."), # C
	("A researcher presented results at the conference.",
	"The presentation won the best paper award."), # N
	("It started raining heavily, so the match was postponed.",
	"The game was delayed due to weather."), # E
	("Every seat on the flight was taken.",
	"There were several empty seats on the plane."), # C
	]

	logits = model.predict(pairs) # shape: (5, 3)

	for (prem, hyp), row in zip(pairs, logits):
	pred_idx = int(np.argmax(row))
	pred = id2label[pred_idx]
	print(f"[{pred}] Premise: {prem} \| Hypothesis: {hyp}")

	```


	## Acknowledgments

	We thank the creators and contributors of `tasksource` and `MoritzLaurer` for making their work available.
	This model would not be possible without their efforts and open source contributions.

	## Citation

	```bibtex
	@misc{nli-compiled-2025,
	title = {FineCat NLI Dataset},
	author = {Lee Miller},
	year = {2025},
	howpublished = {Refined compilation of 6 major NLI datasets}
	}
	```