import json import os input_path = "/home/xj_data/jishengpeng/InteractSpeech/Train600/matched_scores.json" output_path = "/home/xj_data/jishengpeng/InteractSpeech/ms-swift/dataset_new.json" with open(input_path, "r") as fin: input_data = json.load(fin) www = "hello" www = ( "# Dialogue Response Evaluation\n\n" "**IMPORTANT:** Evaluation must include`` rating.\n\n" "Listen to the dialogue recording (two sentences, 1-second pause in between). Evaluate the quality of the **second sentence** as a response to the first, focusing on **text relevance** and the **appropriateness** of **Linguistic information (a range of paralinguistic information such as emotion/age/pitch/speed/volume)**.\n" "**Note:** Focus on evaluating the appropriateness of the second sentence relative to the first, even if the first sentence itself contains contradictory information.\n\n" "## Scoring Criteria\n\n" "**1 points**: Text content is irrelevant or incorrect or illogical.(low intelligence)\n" "**3 points**: Text is relevant, but paralinguistic information is **inappropriate** for the context.(low emotional quotient)\n" "**5 points**: Text is relevant, and paralinguistic information is **appropriate** for the context, resulting in effective communication.(High intelligence and emotional intelligence.)\n\n" "## Evaluation Requirements\n\n" "Response **MUST** follow this format:\n\n" "X (**X is 1, 3, or 5**)\n\n") www = ( "**Task Goal:**\n" "You are provided with a recording of a two-person dialogue (or its transcript, where left and right channels represent different speakers). Your task is to evaluate this dialogue based on two key dimensions: \"Response Quality\" and \"Interaction Quality.\" For each dimension, you must:\n" "1. Provide a detailed analysis (your \"thinking process\").\n" "2. Assign a score of 1 (Poor) or 2 (Excellent).\n" "**Important Definitions:**\n" "* **Interruption:** Refers to an instance where one speaker begins talking while the other is still speaking, potentially causing a brief overlap in audio. Your evaluation should consider the reasonableness of the interruption and how both parties react to it. Brief interjections (e.g., \"um,\" \"ah\") do not constitute an interruption for this evaluation unless they significantly disrupt the interaction flow (e.g., interjections cause audio pause for approximately 5 second).\n" "* **Audio overlap:** Audio overlap refers to portions where the left and right channel audio overlap.\n" "**Evaluation Dimensions Explained:**\n" "**1. Response Quality (Corresponds to \"\")**\n" "* **Core Focus:** The appropriateness, effectiveness, accuracy, and conciseness of the content of the responses.\n" "* **Analytical Guidance (to be detailed in ``):**\n" "* The response directly and appropriately addresses the other person's statement or question.\n" "* When an interruption occurs, the response handles the interruption appropriately.\n" "* Contextual relevance is maintained throughout the response.\n" "* Information is conveyed concisely, without unnecessary redundancy or verbosity.\n" "* There are no factual errors or logical fallacies in the response.\n" "**2. Interaction Quality (Corresponds to \"\")**\n" "* **Core Focus:** The natural flow, timing, and smoothness of the conversational exchange and turn-taking.\n" "* **Analytical Guidance (to be detailed in ``):**\n" "* The overall conversational flow is natural and smooth.\n" "* Pauses, pace, and rhythm are appropriate and natural.\n" "* When interruptions occur, the reactions of both the interrupter and the interrupted party are timely and natural. (e.g., the interrupted person yields appropriately. The interrupter enters at a reasonable moment.)\n" "* Turn-taking is smooth, without long silences (the audio is silent more than 5 seconds) or excessive overlapping speech (the audio is overlapped more than 3 seconds).\n" "* One speaker does not continue talking for too long after being interrupted, avoiding prolonged audio overlap.\n" "**3. Score Explanation: (Corresponds to \"\")**\n" "* `1`: Indicates the evaluation for this dimension is Poor, meaning there are significant issues as described above (e.g., inappropriate responses, unnatural interaction flow, etc.).\n" "* `2`: Indicates the evaluation for this dimension is Excellent, meaning the responses and/or interaction are highly appropriate, effective, and natural, with no major issues.\n" "**Reference Examples for Evaluation (Illustrative of common issues):**\n" "* **Poor Response Quality Examples:**\n" "* After A asks B a question, B's response completely ignores A's question and continues on a different topic.\n" "* B interrupts A with a question, and A's response contains factually incorrect information.\n" "* A's response is filled with unnecessary adjectives and filler words, making it overly verbose.\n" "* **Poor Interaction Quality Examples:**\n" "* After being interrupted, the responder pauses for an unnaturally long time (e.g.,the audio is silent more than 5 seconds) before speaking.\n" "* After being interrupted, the original speaker continues to speak for a significant duration (e.g.,the audio is overlapped more than 3 seconds), causing prolonged overlapping audio.\n" "**Output Format Requirements:**\n" "Strictly adhere to the following format and order. Within the `<... think>` tags, elaborate on your analysis. After that, please give the overall score.\n" "\n" "[Provide your detailed analysis of \"Response Quality\" here...]\n" "\n" "\n" "[Provide your detailed analysis of \"Interaction Quality\" here...]\n" "\n" "X\n" ) # www = "hello" with open(output_path, "w", encoding="utf-8") as fout: for item in input_data: data = { "messages": [ {"role": "user", "content": f"