Validate OCR results by comparing with ground truth
Validate LLM endpoint responses
Evaluate text accuracy from PDF transcriptions