TABLET Split Model - Table Structure Recognition
This repository contains the Split Model implementation from the paper TABLET: Learning From Instructions For Tabular Data, trained for detecting row and column splits in table images.
Model Description
The Split Model is a deep learning architecture designed to detect horizontal and vertical splits in table images, enabling accurate table structure recognition. The model processes a table image and predicts the positions of row and column boundaries.
Architecture
The model consists of three main components:
Modified ResNet-18 Backbone
- Removed max pooling layer for better spatial resolution
- Halved channel dimensions for efficiency (32→256 channels)
- Outputs features at 1/16 resolution (60×60 for 960×960 input)
Feature Pyramid Network (FPN)
- Upsamples backbone features to 1/2 resolution (480×480)
- Reduces channels to 128 dimensions
Dual Transformer Branches
- Horizontal Branch: Detects row splits using 1D transformer
- Vertical Branch: Detects column splits using 1D transformer
- Each branch combines:
- Global features: Learnable weighted averaging
- Local features: Spatial pooling with 1×1 convolution
- Positional embeddings: 1D learned embeddings
- 3-layer transformer encoder with 8 attention heads
Training Details
- Dataset: Combination of FinTabNet and PubTabNet (OTSL format)
- Input Size: 960×960 pixels
- Batch Size: 32
- Epochs: 16
- Optimizer: AdamW (lr=3e-4, weight_decay=5e-4)
- Loss Function: Focal Loss (α=1.0, γ=2.0)
- Ground Truth: Dynamic gap-based split detection from OTSL annotations
Installation
pip install torch torchvision pillow numpy
Usage
Basic Inference
import torch
from PIL import Image
import torchvision.transforms as transforms
from split_model import SplitModel
# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SplitModel().to(device)
# Load checkpoint
checkpoint = torch.load('split_model.pth', map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Prepare image
transform = transforms.Compose([
transforms.Resize((960, 960)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
image = Image.open('table_image.png').convert('RGB')
image_tensor = transform(image).unsqueeze(0).to(device)
# Predict
with torch.no_grad():
h_pred, v_pred = model(image_tensor) # Returns [1, 480] predictions
# Upsample to 960 for visualization
h_pred = h_pred.repeat_interleave(2, dim=1) # [1, 960]
v_pred = v_pred.repeat_interleave(2, dim=1) # [1, 960]
# Apply threshold
h_splits = (h_pred > 0.5).float()
v_splits = (v_pred > 0.5).float()
# Count rows and columns
num_rows = h_splits.sum().item() + 1
num_cols = v_splits.sum().item() + 1
print(f"Detected {num_rows} rows and {num_cols} columns")
Visualize Predictions
Use the included visualization script to test on your images:
python test_split_by_images_folder.py \
--image-folder /path/to/images \
--output-folder predictions_output \
--model-path split_model.pth \
--threshold 0.5
Model Performance
The model was trained on combined FinTabNet and PubTabNet datasets:
- Training samples: ~250K table images
- Validation F1 scores typically achieve >0.90 for both horizontal and vertical splits
- Robust to various table styles, merged cells, and complex layouts
Files in this Repository
split_model.py- Model architecture and dataset classestrain_split_fixed.py- Training scripttest_split_by_images_folder.py- Inference and visualization scriptsplit_model.pth- Trained model weights
Key Features
- Dynamic Gap Detection: Automatically handles varying gap widths between cells
- Overlap Handling: Correctly processes tables with overlapping cell boundaries
- Focal Loss Training: Addresses class imbalance between split and non-split pixels
- Transformer-based: Captures long-range dependencies for complex table structures
Citation
If you use this model, please cite the original TABLET paper:
@article{tablet2025,
title={TABLET: Learning From Instructions For Tabular Data},
author={[Authors from paper]},
journal={arXiv preprint arXiv:2506.07015},
year={2025}
}
Paper Reference
This implementation is based on the Split Model described in Section 3.2 of: TABLET: Learning From Instructions For Tabular Data
License
This model is released for research purposes. Please refer to the original paper for more details.
Acknowledgments
- Original paper authors for the TABLET framework
- FinTabNet and PubTabNet datasets for training data
- PyTorch team for the deep learning framework