AnchorIndoBERT-Politics for Emotion Detection (Criticism vs Appreciation)

AnchorIndoBERT-Politics is a fine-tuned version of IndoBERT-base-p1 developed for detecting emotional tone in Indonesian-language comments. The model focuses on distinguishing between Criticism and Appreciation based on public responses to government performance, particularly from YouTube comments.

1. Model Idea

This work adapts the concept proposed in "Improving multilabel text emotion detection with emotion interrelation anchors" (DOI: 10.1016/j.nlp.2025.100170).

The referenced paper introduces AnchorBERT, a model that enhances multilabel emotion detection by representing each emotion as a learned anchor vector derived from samples that exclusively express that emotion. These emotion anchors capture interrelations between emotions through a multi-head attention mechanism, enabling each text representation to interact with emotion-specific features. A final feed-forward layer predicts the presence of each emotion independently.

In this adaptation, AnchorIndoBERT-Politics applies the same concept to Indonesian text:

  • Uses IndoBERT-base as the encoder.
  • Represents Criticism and Appreciation as learned emotion anchors.
  • Employs a multi-label classification head for binary emotion prediction.

2. Model Details

Component Description
Base model indobenchmark/indobert-base-p1
Architecture IndoBERT + Multi-head Attention (Anchor-based) + Feed-forward classifier
Number of labels 2
Labels Kritik (Criticism), Apresiasi (Appreciation)
Task type Multi-label text classification
Language Indonesian (id)

3. Dataset

Attribute Description
Source YouTube comments on government performance evaluation
Size 250 manually labeled samples
Label distribution Criticism: 150, Appreciation: 100
Labeling criteria Only strong emotional expressions were included; neutral comments were excluded

The dataset is small-scale and exploratory in nature. It aims to demonstrate how emotion interrelation anchors can be adapted to Indonesian socio-political discourse data.

4. Training Configuration

Parameter Value
Epochs 3
Batch size 16
Learning rate 2e-5
Optimizer AdamW
Loss function Binary Cross-Entropy with Logits
Max sequence length 128

5. Performance

  • Validation micro-F1: ~0.84 (Performance may improve with larger and more balanced datasets.)

6. Example Usage

Because the model is note register yet (i dont know how to do it), for inference you may define the model first and download it localy.

import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download

# === Custom Model ===
class AnchorBERT(torch.nn.Module):
    def __init__(self, base_model_name="indobenchmark/indobert-base-p1", num_labels=2, hidden_dropout=0.1, attn_heads=8):
        super().__init__()
        self.encoder = AutoModel.from_pretrained(base_model_name)
        hidden_size = self.encoder.config.hidden_size
        self.mha = torch.nn.MultiheadAttention(embed_dim=hidden_size, num_heads=attn_heads, batch_first=True)
        self.dropout = torch.nn.Dropout(hidden_dropout)
        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(hidden_size * 2, hidden_size),
            torch.nn.ReLU(),
            torch.nn.Dropout(hidden_dropout),
            torch.nn.Linear(hidden_size, num_labels)
        )

    def forward(self, input_ids, attention_mask, anchors_per_batch):
        outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)
        if hasattr(outputs, "pooler_output") and outputs.pooler_output is not None:
            cls = outputs.pooler_output
        else:
            last = outputs.last_hidden_state
            mask = attention_mask.unsqueeze(-1)
            cls = (last * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1e-9)

        query = cls.unsqueeze(1)
        attn_out, _ = self.mha(query=query, key=anchors_per_batch, value=anchors_per_batch)
        attn_out = attn_out.squeeze(1)

        enriched = torch.cat([cls, attn_out], dim=-1)
        logits = self.classifier(enriched)
        return logits


# === Config ===
REPO_ID = "BillyCemerson/AnchorIndoBERT-Politics"
MODEL_NAME = "indobenchmark/indobert-base-p1"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# === Load Tokenizer ===
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# === Download model weights ===
model_path = hf_hub_download(repo_id=REPO_ID, filename="pytorch_model.bin")

model = AnchorBERT(base_model_name=MODEL_NAME, num_labels=2).to(DEVICE)
state_dict = torch.load(model_path, map_location=DEVICE)
model.load_state_dict(state_dict)
model.eval()

# === Load anchors ===
anchors_path = hf_hub_download(repo_id=REPO_ID, filename="mean_anchors.pt")
mean_anchors = torch.load(anchors_path, map_location=DEVICE)

# === Dummy helper (for batch anchors) ===
def mean_anchors_for_batch(anchors, batch_size):
    return anchors.unsqueeze(0).repeat(batch_size, 1, 1)

# === Inference ===
text = "Mantap kinerja pak Prabowo, Gibran ganti aja"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(DEVICE)

with torch.no_grad():
    # Access the tensor within the dictionary
    anchors_batch = mean_anchors_for_batch(mean_anchors[1], inputs["input_ids"].shape[0]).to(DEVICE)
    logits = model(inputs["input_ids"], inputs["attention_mask"], anchors_batch)
    probs = torch.sigmoid(logits).cpu().numpy()

print(f"Text: {text}")
print(f"Kritik={probs[0][0]:.3f} | Apresiasi={probs[0][1]:.3f}") #Expected (Kritik=0.629 | Apresiasi=0.546)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for BillyCemerson/AnchorIndoBERT-Politics

Finetuned
(103)
this model