Browse and submit language model benchmarks
Compare and rank LLMs using benchmark scores
View and submit LLM benchmark evaluations
Convert Stable Diffusion checkpoint to Diffusers and open a PR
Evaluate open LLMs in the languages of LATAM and Spain.
Benchmark AI models by comparison
Explore and visualize diverse models
Convert Hugging Face models to OpenVINO format
Run benchmarks on prediction models
Explore and manage STM32 ML models with the STM32AI Model Zoo dashboard
Display benchmark results
Display model benchmark results
Generate and view leaderboard for LLM evaluations
The HHEM Leaderboard is a platform designed for model benchmarking, allowing users to browse and submit language model benchmarks. It serves as a centralized hub for comparing the performance of various language models across different tasks and datasets. The leaderboard provides a transparent and standardized way to track advancements in language model capabilities.
What does HHEM stand for?
HHEM stands for Human-Human Empirical Metrics, focusing on evaluating language models based on human-like performance benchmarks.
Can I submit my own language model benchmarks?
Yes, HHEM Leaderboard allows users to submit benchmarks for their own language models, provided they follow the submission guidelines and criteria.
How often are the benchmarks updated?
The benchmarks are updated regularly as new models are submitted or as existing models are re-evaluated with updated metrics.