Sai Rajeswar's picture

6 2

Sai Rajeswar

rajeswarsai

·

https://sairajeswar.com/

AI & ML interests

None yet

Recent Activity

authored a paper about 1 month ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

authored a paper about 1 month ago

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

authored a paper about 1 month ago

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

View all activity

Organizations

authored 13 papers about 1 month ago

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 14

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Paper • 2502.01341 • Published Feb 3 • 39

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Paper • 2407.06423 • Published Jul 8, 2024

UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction

Paper • 2503.15661 • Published Mar 19 • 2

StarFlow: Generating Structured Workflow Outputs From Sketch Images

Paper • 2503.21889 • Published Mar 27 • 2

Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA

Paper • 2505.16293 • Published May 22 • 2

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Paper • 2505.20793 • Published May 27 • 12

The Promise of RL for Autoregressive Image Editing

Paper • 2508.01119 • Published Aug 1 • 11

Apriel-Nemotron-15B-Thinker

Paper • 2508.10948 • Published Aug 13 • 5

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs

Paper • 2509.08031 • Published Sep 9 • 21

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Paper • 2508.16763 • Published Aug 22 • 2

Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1 • 114

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval

Paper • 2510.00137 • Published Sep 30 • 2

authored 7 papers over 1 year ago

Multimodal foundation world models for generalist embodied agents

Paper • 2406.18043 • Published Jun 26, 2024 • 1

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Paper • 2406.11811 • Published Jun 17, 2024 • 16

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Paper • 2209.12016 • Published Sep 24, 2022

Efficient Dynamics Modeling in Interactive Environments with Koopman Theory

Paper • 2306.11941 • Published Jun 20, 2023

Capture the Flag: Uncovering Data Insights with Large Language Models

Paper • 2312.13876 • Published Dec 21, 2023 • 1

VCR: Visual Caption Restoration

Paper • 2406.06462 • Published Jun 10, 2024 • 13

Choreographer: Learning and Adapting Skills in Imagination

Paper • 2211.13350 • Published Nov 23, 2022