AnchorIndoBERT-Politics for Emotion Detection (Criticism vs Appreciation)

AnchorIndoBERT-Politics is a fine-tuned version of IndoBERT-base-p1 developed for detecting emotional tone in Indonesian-language comments. The model focuses on distinguishing between Criticism and Appreciation based on public responses to government performance, particularly from YouTube comments.

1. Model Idea

This work adapts the concept proposed in "Improving multilabel text emotion detection with emotion interrelation anchors" (DOI: 10.1016/j.nlp.2025.100170).

The referenced paper introduces AnchorBERT, a model that enhances multilabel emotion detection by representing each emotion as a learned anchor vector derived from samples that exclusively express that emotion. These emotion anchors capture interrelations between emotions through a multi-head attention mechanism, enabling each text representation to interact with emotion-specific features. A final feed-forward layer predicts the presence of each emotion independently.

In this adaptation, AnchorIndoBERT-Politics applies the same concept to Indonesian text:

Uses IndoBERT-base as the encoder.
Represents Criticism and Appreciation as learned emotion anchors.
Employs a multi-label classification head for binary emotion prediction.

2. Model Details

Component	Description
Base model	`indobenchmark/indobert-base-p1`
Architecture	IndoBERT + Multi-head Attention (Anchor-based) + Feed-forward classifier
Number of labels	2
Labels	Kritik (Criticism), Apresiasi (Appreciation)
Task type	Multi-label text classification
Language	Indonesian (`id`)

3. Dataset

Attribute	Description
Source	YouTube comments on government performance evaluation
Size	250 manually labeled samples
Label distribution	Criticism: 150, Appreciation: 100
Labeling criteria	Only strong emotional expressions were included; neutral comments were excluded

The dataset is small-scale and exploratory in nature. It aims to demonstrate how emotion interrelation anchors can be adapted to Indonesian socio-political discourse data.

4. Training Configuration

Parameter	Value
Epochs	3
Batch size	16
Learning rate	2e-5
Optimizer	AdamW
Loss function	Binary Cross-Entropy with Logits
Max sequence length	128

5. Performance

Validation micro-F1: ~0.84 (Performance may improve with larger and more balanced datasets.)

6. Example Usage

Because the model is note register yet (i dont know how to do it), for inference you may define the model first and download it localy.

import torch
from transformers import AutoTokenizer, AutoModel
from huggingface_hub import hf_hub_download

# === Custom Model ===
class AnchorBERT(torch.nn.Module):
    def __init__(self, base_model_name="indobenchmark/indobert-base-p1", num_labels=2, hidden_dropout=0.1, attn_heads=8):
        super().__init__()
        self.encoder = AutoModel.from_pretrained(base_model_name)
        hidden_size = self.encoder.config.hidden_size
        self.mha = torch.nn.MultiheadAttention(embed_dim=hidden_size, num_heads=attn_heads, batch_first=True)
        self.dropout = torch.nn.Dropout(hidden_dropout)
        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(hidden_size * 2, hidden_size),
            torch.nn.ReLU(),
            torch.nn.Dropout(hidden_dropout),
            torch.nn.Linear(hidden_size, num_labels)
        )

    def forward(self, input_ids, attention_mask, anchors_per_batch):
        outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)
        if hasattr(outputs, "pooler_output") and outputs.pooler_output is not None:
            cls = outputs.pooler_output
        else:
            last = outputs.last_hidden_state
            mask = attention_mask.unsqueeze(-1)
            cls = (last * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1e-9)

        query = cls.unsqueeze(1)
        attn_out, _ = self.mha(query=query, key=anchors_per_batch, value=anchors_per_batch)
        attn_out = attn_out.squeeze(1)

        enriched = torch.cat([cls, attn_out], dim=-1)
        logits = self.classifier(enriched)
        return logits


# === Config ===
REPO_ID = "BillyCemerson/AnchorIndoBERT-Politics"
MODEL_NAME = "indobenchmark/indobert-base-p1"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# === Load Tokenizer ===
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# === Download model weights ===
model_path = hf_hub_download(repo_id=REPO_ID, filename="pytorch_model.bin")

model = AnchorBERT(base_model_name=MODEL_NAME, num_labels=2).to(DEVICE)
state_dict = torch.load(model_path, map_location=DEVICE)
model.load_state_dict(state_dict)
model.eval()

# === Load anchors ===
anchors_path = hf_hub_download(repo_id=REPO_ID, filename="mean_anchors.pt")
mean_anchors = torch.load(anchors_path, map_location=DEVICE)

# === Dummy helper (for batch anchors) ===
def mean_anchors_for_batch(anchors, batch_size):
    return anchors.unsqueeze(0).repeat(batch_size, 1, 1)

# === Inference ===
text = "Mantap kinerja pak Prabowo, Gibran ganti aja"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(DEVICE)

with torch.no_grad():
    # Access the tensor within the dictionary
    anchors_batch = mean_anchors_for_batch(mean_anchors[1], inputs["input_ids"].shape[0]).to(DEVICE)
    logits = model(inputs["input_ids"], inputs["attention_mask"], anchors_batch)
    probs = torch.sigmoid(logits).cpu().numpy()

print(f"Text: {text}")
print(f"Kritik={probs[0][0]:.3f} | Apresiasi={probs[0][1]:.3f}") #Expected (Kritik=0.629 | Apresiasi=0.546)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BillyCemerson/AnchorIndoBERT-Politics

Base model

indobenchmark/indobert-base-p1

Finetuned

(103)

this model