Papers
arXiv:2507.03255

ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis

Published on Jul 4
Authors:
,
,
,
,
,

Abstract

High-Level Synthesis (HLS) plays a crucial role in modern hardware design by transforming high-level code into optimized hardware implementations. However, progress in applying machine learning (ML) to HLS optimization has been hindered by a shortage of sufficiently large and diverse datasets. To bridge this gap, we introduce ForgeHLS, a large-scale, open-source dataset explicitly designed for ML-driven HLS research. ForgeHLS comprises over 400k diverse designs generated from 846 kernels covering a broad range of application domains, consuming over 200k CPU hours during dataset construction. Each kernel includes systematically automated pragma insertions (loop unrolling, pipelining, array partitioning), combined with extensive design space exploration using Bayesian optimization. Compared to existing datasets, ForgeHLS significantly enhances scale, diversity, and design coverage. We further define and evaluate representative downstream tasks in Quality of Result (QoR) prediction and automated pragma exploration, clearly demonstrating ForgeHLS utility for developing and improving ML-based HLS optimization methodologies. The dataset and code are public at https://github.com/zedong-peng/ForgeHLS.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.03255 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.03255 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.