Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability Paper • 2506.08300 • Published Jun 10 • 8