CompareHub Documentation
CompareHub helps you pick the right LLM for your task by running blind, reproducible comparisons with multiple judges. Learn how to get started and make the most of the platform.
Getting Started
Learn the basics and run your first comparison
How Scoring Works
Understand judge ensembles and agreement metrics
Privacy & Keys
Use your own API keys and control data storage
API Reference
Integrate CompareHub into your workflow
What is CompareHub?
CompareHub is a platform for running blind, reproducible LLM comparisons. Instead of relying on vibes or brand names, you run the same task across multiple models and let an ensemble of judges evaluate the outputs based on clear criteria.
Key Features
- •Blind mode - Model names hidden until you reveal them to prevent brand bias
- •Multiple judges - Ensemble scoring with visible agreement metrics and rationales
- •Reproducible - Every run pins dataset, prompt, and model versions in a shareable permalink
- •Cost & speed tracking - See real-time cost (€) and latency (ms) for every model
- •Export & embed - Download CSV/JSON or embed reports in your docs
How It Works
- 1.Compose your prompt - Pick a task template or paste your own prompt with variables
- 2.Select models - Choose multiple models to compare (names hidden by default)
- 3.Run & judge - Models respond, judges evaluate, you get scores + permalink
- 4.Share & export - Share the report, export data, or re-run with same conditions
New to CompareHub? Start with the Getting Started guide to run your first comparison in under 5 minutes.