--- title: ParaPLUIE emoji: ☂️ tags: - evaluate - metric description: >- ParaPLUIE is a metric for evaluating the semantic proximity of two sentences. ParaPLUIE use the perplexity of an LLM to compute a confidence score. It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost. sdk: gradio sdk_version: 3.19.1 app_file: app.py pinned: false short_description: ParaPLUIE is a metric for evaluating the semantic proximity --- # Metric Card for ParaPLUIE (Paraphrase Generation Evaluation Powered by an LLM) ## Metric Description ParaPLUIE is a metric for evaluating the semantic proximity of two sentences. ParaPLUIE use the perplexity of an LLM to compute a confidence score. It has shown the highest correlation with human judgement on paraphrase classification meanwhile reamin the computional cost low as it roughtly equal to one token generation cost. ## How to Use This metric requires a source sentence and it's hypothetical paraphrase. ```python import evaluate ppluie = evaluate.load("qlemesle/parapluie") ppluie.init(model="mistralai/Mistral-7B-Instruct-v0.2") S = "Have you ever seen a tsunami ?" H = "Have you ever seen a tiramisu ?" results = ppluie.compute(sources=[S], hypotheses=[H]) print(results) >>> {'scores': [-16.97607421875]} ``` ### Inputs - **sources** (`list` of `string`): Source sentences. - **hypotheses** (`list` of `string`): Hypothetical paraphrases. ### Output Values - **score** (`float`): ParaPLUIE score. Minimum possible value is -inf. Maximum possible value is +inf. A score greater than 0 mean that sentences are paraphrases. A score lower than 0 mean the opposite. This metric outputs a dictionary, containing the score. ### Examples Simple example ```python import evaluate ppluie = evaluate.load("qlemesle/parapluie") ppluie.init(model="mistralai/Mistral-7B-Instruct-v0.2") S = "Have you ever seen a tsunami ?" H = "Have you ever seen a tiramisu ?" results = ppluie.compute(sources=[S], hypotheses=[H]) print(results) >>> {'scores': [-16.97607421875]} ``` Configure metric ```python ppluie.init( model = "mistralai/Mistral-7B-Instruct-v0.2", device = "cuda:0", template = "FS-DIRECT", use_chat_template = True, half_mode = True, n_right_specials_tokens = 1 ) ``` show available prompting templates ```python ppluie.show_templates() >>> DIRECT >>> MEANING >>> INDIRECT >>> FS-DIRECT >>> FS-DIRECT_MAJ >>> FS-DIRECT_FR >>> FS-DIRECT_MAJ_FR >>> FS-DIRECT_FR_MIN >>> NETWORK ``` show LLM already tested with ParaPLUIE ```python ppluie.show_available_models() >>> HuggingFaceTB/SmolLM2-135M-Instruct >>> HuggingFaceTB/SmolLM2-360M-Instruct >>> HuggingFaceTB/SmolLM2-1.7B-Instruct >>> google/gemma-2-2b-it >>> state-spaces/mamba-2.8b-hf >>> internlm/internlm2-chat-1_8b >>> microsoft/Phi-4-mini-instruct >>> mistralai/Mistral-7B-Instruct-v0.2 >>> tiiuae/falcon-mamba-7b-instruct >>> Qwen/Qwen2.5-7B-Instruct >>> CohereForAI/aya-expanse-8b >>> google/gemma-2-9b-it >>> meta-llama/Meta-Llama-3-8B-Instruct >>> microsoft/phi-4 >>> CohereForAI/aya-expanse-32b >>> Qwen/QwQ-32B >>> CohereForAI/c4ai-command-r-08-2024 ``` change prompting template ```python ppluie.setTemplate("DIRECT") ``` show how is the prompt encoded, to ensure that the correct numbers of special tokens are removed and Yes / No words fit on one token ```python ppluie.check_end_tokens_tmpl() ``` ## Limitations and Bias This metric is based on an LLM and therefore is limited by the LLM used. ## Source code [GitLab](https://gitlab.inria.fr/expression/paraphrase-generation-evaluation-powered-by-an-llm-a-semantic-metric-not-a-lexical-one-coling-2025) ## Citation ```bibtex @inproceedings{lemesle-etal-2025-paraphrase, title = "Paraphrase Generation Evaluation Powered by an {LLM}: A Semantic Metric, Not a Lexical One", author = "Lemesle, Quentin and Chevelu, Jonathan and Martin, Philippe and Lolive, Damien and Delhay, Arnaud and Barbot, Nelly", booktitle = "Proceedings of the 31st International Conference on Computational Linguistics", year = "2025", url = "https://aclanthology.org/2025.coling-main.538/" } ```