AraStyleTransfer-21 | 21 Arabic Author Styles. One Model.
๐ First Place Winner at AraGenEval 2025 Competition
A state-of-the-art Arabic text style transfer model that transforms text into the writing style of 21 different Arabic authors using descriptive author tokens and prompt engineering.
๐ Paper Link (ACL Anthology)
๐ ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer [https://aclanthology.org/2025.arabicnlp-sharedtasks.8.pdf]
๐๏ธ Model Architecture
- Base Model: UBC-NLP/AraT5v2-base-1024
- Approach: Descriptive Author Tokens + Prompt Engineering
- Input Format:
"ุงูุชุจ ุงููุต ุงูุชุงูู ุจุฃุณููุจ <author:name>: [text]" - Training: Fine-tuned with author-specific tokens
๐ฌ Technical Details
Stylometric Analysis
The model incorporates comprehensive stylometric analysis including:
- Lexical Features: Sentence length, word length, vocabulary richness
- Syntactic Patterns: Definite articles, conjunctions, prepositions
- Author-Specific Vocabulary: TF-IDF based characteristic words
- Style Classification: Formality, complexity, emotional intensity
Prompt Engineering
- Format:
"ุงูุชุจ ุงููุต ุงูุชุงูู ุจุฃุณููุจ <author:ููุณู_ุฅุฏุฑูุณ>: [original_text]" - Author Tokens: Descriptive tokens like
<author:ููุณู_ุฅุฏุฑูุณ> - Target: Generated text in author's style
๐ Supported Authors
๐ Input File Format
For batch processing, your input file should have the following format:
๐ Example Snippets from the Dataset
| id | text_in_msa (partial) | text_in_author_style (partial) |
|---|---|---|
| 3835 | "ูู ุฃูู ู ุทูููุง ุจุงูุงุญุชูุงู ุจุนูุฏ ู ููุงุฏู... ูููุช ุฃุชุฌุงุฏู ู ุน ูุงู ู ุงูุดูุงูู..." | "ุนู ุฑู ู ุง ุงุญุชููุช ุจุนูุฏ ู ููุงุฏู... ูุฃุชุดุงุฌุฑ ู ุน ูุงู ู ุงูุดูุงูู ุนูู ุฐูู ุงูุงูุชุฆุงุจ..." |
| 3836 | "ุงูุฒู ู ุงูุนุงู ูู ุงูุนุฏุงุฏ ุงูุฌู ุงุนู ุงูุฐู ูุณุฌู ุงูุณููู... ููุจุฑุฒ ุงูุฒู ู ุงูุฎุงุต..." | "ุงูุฒู ู ุงูุนุงู ูุนุฏู ุงูุณููู ูููุงุณ ูููุง... ุฃู ุง ุนุฏุงุฏู ุงูุฎุงุต ูุฃูุช ูุงุฏุฑูุง ู ุง ุชูุธุฑ ููู..." |
| 3837 | "ู ุตุฑ ุงูุบููุฉ ุงูุฑุงููุฉ... ุงุดุชุฑุงููุฉ ูุฏูู ูุฑุงุทูุฉ ุชุชูุงุนู ู ุนูุง... ุฃุญูุงู ุงูุฎู ุณูู..." | "ู ุตุฑ ุงูู ุตูููุนุฉ... ุงูููู ู ุงุฆุฉ ุฒูุฑุฉ... ูุญูู ุฃุจูุบ ุงูุฎู ุณูู ุฃุจุฏุฃ ุฃุนูุด ูุฃุชุนูู ุงูู ูุณููู..." |
| 3838 | "ุบุฑุงุจุฉ ุงูุชุฌุฑุจุฉ... ุทูููุฉ ุฌุงุฏุฉ ุชู ุงู ูุง ุจูุง ู ุฑุญ... ุงูุทูููุฉ ูุงูุช ุนูุจูุง..." | "ุบุฑูุจุฉ ูู ุงูุฃููุงุฑ... ููุชู ุฑุฌููุง ุฑููุจูุง ูู ุซูุจ ุทูู... ูุงูุทูููุฉ ุชููู ุฉ ูุฎุดู ุงูุงุนุชุฑุงู ุจูุง..." |
| 3839 | "ูุฐุง ููุณ ูุฏู ูุง... ู ูุฌุฉ ุชูููู ููุฉ... ุงููุตุฑ ุงูุญูููู ุฃู ุชุนูุด ูู ุง ุชุฎุชุงุฑ..." | "ููุณ ู ุฑุงุฑุฉ ููุง ูุฏู ูุง... ุฃูุช ุชูุงุถู ู ูุฌุฉ ุฃุนุชู ู ูู... ูุงูุญู ุฃู ุชุญูุง ูู ุง ุงุฎุชุฑุช ุฃูุช..." |
๐ Performance Metrics
- BLEU Score: 24.58
- chrF Score: 59.01
- Competition: First Place in AraGenEval 2024
- Supported Authors: 21 Arabic authors
Official results on the AraGenEval 2025 testset. Our prompt engineering system ranked first.
๐ Quick Start: Style Transfer Example
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
# Load model
model_name = "Omartificial-Intelligence-Space/AraStyleTransfer-21"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Input text and author
text = "ูู
ุฃูู
ู
ุทูููุง ุจุงูุงุญุชูุงู ุจุนูุฏ ู
ููุงุฏู ู
ูุฐ ุทูููุชู."
author = "ููุณู ุฅุฏุฑูุณ"
# Prompt format
prompt = f"ุงูุชุจ ุงููุต ุงูุชุงูู ุจุฃุณููุจ <author:{author.replace(' ', '_')}>: {text}"
# Tokenize
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Generate
output_ids = model.generate(
**inputs,
max_length=256,
num_beams=5,
early_stopping=True
)
# Decode
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Original:", text)
print("Author:", author)
print("Output:", generated_text)
๐ฏ Use Cases
- Content Creation: Generate text in specific author styles
- Educational Tools: Demonstrate different writing styles
- Research: Study Arabic literary styles and patterns
- Creative Writing: Inspire new content in classic styles
๐ค Contributing
This model was developed for the AraGenEval 2025 competition. For questions or contributions, please refer to the competition guidelines.
๐ License
This model is released under the same license as the base AraT5v2 model.
BibTeX Citation
@inproceedings{nacar2025anlpers,
title={ANLPers at AraGenEval Shared Task: Descriptive Author Tokens for Transparent Arabic Authorship Style Transfer},
author={Nacar, Omer and Reda, Mahmoud and Sibaee, Serry and Alhabashi, Yasser and Ammar, Adel and Boulila, Wadii},
booktitle={Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks},
pages={49--53},
year={2025}
}
๐ First Place Winner at AraGenEval 2025 - Arabic Text Style Transfer Competition
- Downloads last month
- 5
Model tree for Omartificial-Intelligence-Space/AraStyleTransfer-21
Base model
UBC-NLP/AraT5v2-base-1024