TABLET Split Model - Table Structure Recognition

This repository contains the Split Model implementation from the paper TABLET: Learning From Instructions For Tabular Data, trained for detecting row and column splits in table images.

Model Description

The Split Model is a deep learning architecture designed to detect horizontal and vertical splits in table images, enabling accurate table structure recognition. The model processes a table image and predicts the positions of row and column boundaries.

Architecture

The model consists of three main components:

Modified ResNet-18 Backbone
- Removed max pooling layer for better spatial resolution
- Halved channel dimensions for efficiency (32→256 channels)
- Outputs features at 1/16 resolution (60×60 for 960×960 input)
Feature Pyramid Network (FPN)
- Upsamples backbone features to 1/2 resolution (480×480)
- Reduces channels to 128 dimensions
Dual Transformer Branches
- Horizontal Branch: Detects row splits using 1D transformer
- Vertical Branch: Detects column splits using 1D transformer
- Each branch combines:
  - Global features: Learnable weighted averaging
  - Local features: Spatial pooling with 1×1 convolution
  - Positional embeddings: 1D learned embeddings
- 3-layer transformer encoder with 8 attention heads

Training Details

Dataset: Combination of FinTabNet and PubTabNet (OTSL format)
Input Size: 960×960 pixels
Batch Size: 32
Epochs: 16
Optimizer: AdamW (lr=3e-4, weight_decay=5e-4)
Loss Function: Focal Loss (α=1.0, γ=2.0)
Ground Truth: Dynamic gap-based split detection from OTSL annotations

Installation

pip install torch torchvision pillow numpy

Usage

Basic Inference

import torch
from PIL import Image
import torchvision.transforms as transforms
from split_model import SplitModel

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SplitModel().to(device)

# Load checkpoint
checkpoint = torch.load('split_model.pth', map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Prepare image
transform = transforms.Compose([
    transforms.Resize((960, 960)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

image = Image.open('table_image.png').convert('RGB')
image_tensor = transform(image).unsqueeze(0).to(device)

# Predict
with torch.no_grad():
    h_pred, v_pred = model(image_tensor)  # Returns [1, 480] predictions

    # Upsample to 960 for visualization
    h_pred = h_pred.repeat_interleave(2, dim=1)  # [1, 960]
    v_pred = v_pred.repeat_interleave(2, dim=1)  # [1, 960]

    # Apply threshold
    h_splits = (h_pred > 0.5).float()
    v_splits = (v_pred > 0.5).float()

    # Count rows and columns
    num_rows = h_splits.sum().item() + 1
    num_cols = v_splits.sum().item() + 1

    print(f"Detected {num_rows} rows and {num_cols} columns")

Visualize Predictions

Use the included visualization script to test on your images:

python test_split_by_images_folder.py \
    --image-folder /path/to/images \
    --output-folder predictions_output \
    --model-path split_model.pth \
    --threshold 0.5

Model Performance

The model was trained on combined FinTabNet and PubTabNet datasets:

Training samples: ~250K table images
Validation F1 scores typically achieve >0.90 for both horizontal and vertical splits
Robust to various table styles, merged cells, and complex layouts

Files in this Repository

split_model.py - Model architecture and dataset classes
train_split_fixed.py - Training script
test_split_by_images_folder.py - Inference and visualization script
split_model.pth - Trained model weights

Key Features

Dynamic Gap Detection: Automatically handles varying gap widths between cells
Overlap Handling: Correctly processes tables with overlapping cell boundaries
Focal Loss Training: Addresses class imbalance between split and non-split pixels
Transformer-based: Captures long-range dependencies for complex table structures

Citation

If you use this model, please cite the original TABLET paper:

@article{tablet2025,
  title={TABLET: Learning From Instructions For Tabular Data},
  author={[Authors from paper]},
  journal={arXiv preprint arXiv:2506.07015},
  year={2025}
}

Paper Reference

This implementation is based on the Split Model described in Section 3.2 of: TABLET: Learning From Instructions For Tabular Data

License

This model is released for research purposes. Please refer to the original paper for more details.

Acknowledgments

Original paper authors for the TABLET framework
FinTabNet and PubTabNet datasets for training data
PyTorch team for the deep learning framework

Downloads last month: -; Downloads are not tracked for this model. How to track

santhoshkammari
/

tablet-split-model