BuildBid Bench
Real plan-set benchmarks, published honestly.
BuildBid Bench is our open log of what the platform actually does when pointed at real construction plan sets. Each benchmark lists the plan type, the time to first draft, how many items the model extracted, how many the estimator corrected, and the scope gaps we caught (or missed). No synthetic results, no cherry-picked demos.
How a benchmark ships
- A real plan set is run through production extraction — no tweaking of prompts for the benchmark.
- An estimator reviews the draft and logs every correction, missed scope, and false positive.
- The corrected CSV and three screenshots (raw, flagged, approved) are archived.
- The benchmark row lands here with the verdict: bid-ready, needs-review, or not-ready.
We will never publish a benchmark without its corrected CSV. If the page is empty, it means we have not archived three complete runs yet.
No benchmarks yet
BuildBid Bench publishes results from real plan-set runs.
The first three benchmarks ship after we archive corrected CSVs and screenshots from production runs. We will not fabricate numbers to fill this page. Subscribe to the log if you want to know when the first one lands.