Farsisham 🇮🇷: (POS)--> Part-of-Speech Model for Persian Lang

🤗 Quick Start Load and use the pre-trained POS tagger from Hugging Face: from farsisham.pos_tagger import POSTagger

Load the model

tagger = POSTagger.from_pretrained("AmirHossein1455/farsisham")

Tag a Persian sentence

text = "سلام من امیرحسین هستم"

tagged_sentence = tagger.tag_sentence(text)

print(tagged_sentence)

Alternatively, download the model manually from the Hugging Face Model Hub and load it locally.

🧠 Training a Custom Model

Train your own POS tagger using a custom corpus:

from farsisham.pos_tagger import POSTagger

tagger = POSTagger()

tagger.train("path/to/your/corpus.txt")

📊 Model Evaluation

The POS tagger was evaluated on a test set of 71 samples. Key performance metrics:

Overall Accuracy: 90%

Macro Average: Precision: 0.81, Recall: 0.78, F1-score: 0.77 Weighted Average: Precision: 0.93, Recall: 0.90, F1-score: 0.91

Per-label performance:

ADJ: Precision: 0.80, Recall: 0.80, F1-score: 0.80 (Support: 5) ADV: Precision: 1.00, Recall: 0.80, F1-score: 0.89 (Support: 5) CON: Precision: 0.50, Recall: 1.00, F1-score: 0.67 (Support: 1) DET: Precision: 1.00, Recall: 0.50, F1-score: 0.67 (Support: 4) N: Precision: 0.85, Recall: 1.00, F1-score: 0.92 (Support: 17) P: Precision: 1.00, Recall: 1.00, F1-score: 1.00 (Support: 7) PRO: Precision: 1.00, Recall: 0.83, F1-score: 0.91 (Support: 6) PUNC: Precision: 1.00, Recall: 1.00, F1-score: 1.00 (Support: 12) QUA: Precision: 0.00, Recall: 0.00, F1-score: 0.00 (Support: 0) V: Precision: 0.92, Recall: 0.86, F1-score: 0.89 (Support: 14)

Note: “Support” indicates the number of samples per label in the test set.

🎯 Intended Use

Farsisham is designed for:

Researchers developing Persian NLP applications. Developers building tools like chatbots, text analyzers, or translation systems. Educators and linguists studying Persian language structures.

⚠️ Limitations

The POS tagger’s performance may vary with out-of-domain text or informal Persian. The lemmatizer relies on a provided wordlist, which may not cover all vocabulary. Limited support for low-resource labels (e.g., QUA) due to small training data.

📄 License Licensed under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support