GiliGold
/

VAD_binomial_regression_models

Model card Files Files and versions

GiliGold commited on Feb 4

Commit

f630722

·

verified ·

1 Parent(s): 9834f42

Update README.md

Files changed (1) hide show

README.md +46 -3

README.md CHANGED Viewed

@@ -1,3 +1,46 @@
----
-license: cc-by-sa-4.0
----

+---
+license: cc-by-sa-4.0
+---
+# VAD Binomial Regression Models
+This repository contains three binomial regression models designed to predict VAD (Valence, Arousal, Dominance) scores for text inputs.
+Each model is stored as a separate pickle (.pkl) file:
+valence_model.pkl: Predicts the valence score (positivity/negativity).
+arousal_model.pkl: Predicts the arousal score (level of excitement or calm).
+dominance_model.pkl: Predicts the dominance score (sense of control or influence).
+All scores are normalized on a scale from 0 to 1.
+Before making predictions, input text must be converted into embeddings using the [Knesset-multi-e5-large](https://huggingface.co/GiliGold/Knesset-multi-e5-large) model. The embeddings are then fed into the regression models.
+## Training Data
+The models were trained using a combination of datasets to ensure robust and generalizable predictions:
+[Emobank Dataset](https://aclanthology.org/E17-2092/) (by buechel-hahn-2017-emobank): A comprehensive dataset containing emotional text data that we automaticaly translated to Hebrew using [Google/madlad400-3b-mt](https://huggingface.co/google/madlad400-3b-mt).
+[Hebrew VAD Lexicon](https://huggingface.co/datasets/GiliGold/Hebrew_VAD_lexicon): A lexicon that provides VAD scores for Hebrew words.
+[Knesset Sentences](https://huggingface.co/datasets/GiliGold/VAD_KnessetCorpus): A manually annotated set of 120 Knesset sentences with VAD scores, serving as an additional benchmark and source of training data.
+This diverse training data allowed the models to capture nuanced emotional features across different text domains, especially in Hebrew.
+## Model Details
+- Model Type: Binomial Regression
+- Input: Preprocessed text data (the specific feature extraction process should align with the training procedure).
+- Output: VAD scores (valence, arousal, and dominance) on a continuous scale from 0 to 1.
+Each model is provided as a .pkl file and can be loaded using Python's pickle module.
+## Usage Example
+```python
+from sentence_transformers import SentenceTransformer
+import pickle
+model = SentenceTransformer('GiliGold/Knesset-multi-e5-large')
+embedding_vector = model.encode(sentence)
+# Load the valence model
+with open("valence_model.pkl", "rb") as file:
+    valence_model = pickle.load(file)
+# Assume `embedding_vector` is the vector obtained from the Knesset-multi model
+valence_score = valence_model.predict([embedding_vector])
+print(f"Predicted Valence Score: {valence_score[0]}")
+```