TABLET Split Model - Table Structure Recognition

This repository contains the Split Model implementation from the paper TABLET: Learning From Instructions For Tabular Data, trained for detecting row and column splits in table images.

Model Description

The Split Model is a deep learning architecture designed to detect horizontal and vertical splits in table images, enabling accurate table structure recognition. The model processes a table image and predicts the positions of row and column boundaries.

Architecture

The model consists of three main components:

  1. Modified ResNet-18 Backbone

    • Removed max pooling layer for better spatial resolution
    • Halved channel dimensions for efficiency (32→256 channels)
    • Outputs features at 1/16 resolution (60×60 for 960×960 input)
  2. Feature Pyramid Network (FPN)

    • Upsamples backbone features to 1/2 resolution (480×480)
    • Reduces channels to 128 dimensions
  3. Dual Transformer Branches

    • Horizontal Branch: Detects row splits using 1D transformer
    • Vertical Branch: Detects column splits using 1D transformer
    • Each branch combines:
      • Global features: Learnable weighted averaging
      • Local features: Spatial pooling with 1×1 convolution
      • Positional embeddings: 1D learned embeddings
    • 3-layer transformer encoder with 8 attention heads

Training Details

  • Dataset: Combination of FinTabNet and PubTabNet (OTSL format)
  • Input Size: 960×960 pixels
  • Batch Size: 32
  • Epochs: 16
  • Optimizer: AdamW (lr=3e-4, weight_decay=5e-4)
  • Loss Function: Focal Loss (α=1.0, γ=2.0)
  • Ground Truth: Dynamic gap-based split detection from OTSL annotations

Installation

pip install torch torchvision pillow numpy

Usage

Basic Inference

import torch
from PIL import Image
import torchvision.transforms as transforms
from split_model import SplitModel

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SplitModel().to(device)

# Load checkpoint
checkpoint = torch.load('split_model.pth', map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Prepare image
transform = transforms.Compose([
    transforms.Resize((960, 960)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

image = Image.open('table_image.png').convert('RGB')
image_tensor = transform(image).unsqueeze(0).to(device)

# Predict
with torch.no_grad():
    h_pred, v_pred = model(image_tensor)  # Returns [1, 480] predictions

    # Upsample to 960 for visualization
    h_pred = h_pred.repeat_interleave(2, dim=1)  # [1, 960]
    v_pred = v_pred.repeat_interleave(2, dim=1)  # [1, 960]

    # Apply threshold
    h_splits = (h_pred > 0.5).float()
    v_splits = (v_pred > 0.5).float()

    # Count rows and columns
    num_rows = h_splits.sum().item() + 1
    num_cols = v_splits.sum().item() + 1

    print(f"Detected {num_rows} rows and {num_cols} columns")

Visualize Predictions

Use the included visualization script to test on your images:

python test_split_by_images_folder.py \
    --image-folder /path/to/images \
    --output-folder predictions_output \
    --model-path split_model.pth \
    --threshold 0.5

Model Performance

The model was trained on combined FinTabNet and PubTabNet datasets:

  • Training samples: ~250K table images
  • Validation F1 scores typically achieve >0.90 for both horizontal and vertical splits
  • Robust to various table styles, merged cells, and complex layouts

Files in this Repository

  • split_model.py - Model architecture and dataset classes
  • train_split_fixed.py - Training script
  • test_split_by_images_folder.py - Inference and visualization script
  • split_model.pth - Trained model weights

Key Features

  • Dynamic Gap Detection: Automatically handles varying gap widths between cells
  • Overlap Handling: Correctly processes tables with overlapping cell boundaries
  • Focal Loss Training: Addresses class imbalance between split and non-split pixels
  • Transformer-based: Captures long-range dependencies for complex table structures

Citation

If you use this model, please cite the original TABLET paper:

@article{tablet2025,
  title={TABLET: Learning From Instructions For Tabular Data},
  author={[Authors from paper]},
  journal={arXiv preprint arXiv:2506.07015},
  year={2025}
}

Paper Reference

This implementation is based on the Split Model described in Section 3.2 of: TABLET: Learning From Instructions For Tabular Data

License

This model is released for research purposes. Please refer to the original paper for more details.

Acknowledgments

  • Original paper authors for the TABLET framework
  • FinTabNet and PubTabNet datasets for training data
  • PyTorch team for the deep learning framework
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train santhoshkammari/tablet-split-model