Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment Paper • 2412.06209 • Published Dec 9, 2024
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
Sheet Music Transformer ++: End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music Paper • 2405.12105 • Published May 20, 2024
Predicting performance difficulty from piano sheet music images Paper • 2309.16287 • Published Sep 28, 2023
Sheet Music Transformer: End-To-End Optical Music Recognition Beyond Monophonic Transcription Paper • 2402.07596 • Published Feb 12, 2024 • 1