Indonesian Text Sentiment Analysis πŸš€

πŸ“Œ Overview

This project fine-tunes a transformer-based model to analyze sentiment for Indonesian text.

πŸ“₯ Data Collection

The dataset used for fine-tuning was sourced from IndoNLU Datasets, specifically:
SmSA (IndoNLU) Dataset

πŸ”„ Data Preparation

  • Tokenization:
    • Used Indobert for efficient text processing.
  • Train-Test Split:
    • The Dataset is already splitted into train, validation, and test.

πŸ‹οΈ Fine-Tuning & Results

The model was fine-tuned using TensorFlow Hugging Face Transformers.

πŸ“Š Evaluation Metrics

Epoch Train Loss Train Accuracy Eval Loss Eval Accuracy Training Time Validation Time
1 0.2471 88.15% 0.2107 91.31% 7:55 min 10 sec
2 0.1844 90.41% 0.2107 92.39% 7:50 min 10 sec
3 0.1502 91.66% 0.2135 93.14% 7:51 min 9 sec
4 0.1285 92.50% 0.2192 93.69% 7:50 min 10 sec
5 0.1101 93.13% 0.2367 94.14% 7:48 min 9 sec

βš™οΈ Training Parameters

epochs = 5

learning_rate = 5e-5

seed_val = 42

max_length = 128

batch_size = 32

eval_batch_size = 32

πŸ€– How to use

import tensorflow as tf
from transformers import TFAutoModelForSequenceClassification, AutoTokenizer

# Load model dan tokenizer
model_name = "feverlash/Indonesian-SentimentAnalysis-Model"  # Ganti dengan path model yang telah disimpan
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)

# Fungsi untuk melakukan prediksi sentimen
def predict(text):
    sentiment_mapping = {
        1: "positive",
        0: "negative",
        2: "neutral"
    }
    
    # Tokenisasi teks
    inputs = tokenizer(
        text,
        return_tensors="tf",
        truncation=True,
        padding="max_length",
        max_length=128
    )
    
    # Prediksi menggunakan model
    outputs = model(inputs)
    logits = outputs.logits

    # Menghitung probabilitas
    probabilities = tf.nn.softmax(logits).numpy()
    
    # Menentukan label prediksi
    predicted_index = int(tf.argmax(probabilities, axis=1).numpy()[0])
    predicted_label = sentiment_mapping.get(predicted_index, "unknown")
    
    # Keyakinan prediksi
    confidence = probabilities[0][predicted_index]

    print(f"Teks: {text}")
    print(f"Prediksi label: {predicted_label} (Confidence: {confidence:.2f})")

# Contoh penggunaan
text = "aku sedang jalan-jalan di Yogyakarta"
predict(text)
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for feverlash/Indonesian-SentimentAnalysis-Model

Finetuned
(103)
this model

Dataset used to train feverlash/Indonesian-SentimentAnalysis-Model