Browse and submit language model benchmarks
Retrain models for new data at edge devices
View LLM Performance Leaderboard
Track, rank and evaluate open LLMs and chatbots
View and compare language model evaluations
Teach, test, evaluate language models with MTEB Arena
Persian Text Embedding Benchmark
Display LLM benchmark leaderboard and info
Leaderboard of information retrieval models in French
Upload ML model to Hugging Face Hub
Display leaderboard for earthquake intent classification models
Export Hugging Face models to ONNX
Measure over-refusal in LLMs using OR-Bench
The HHEM Leaderboard is a platform designed for model benchmarking, allowing users to browse and submit language model benchmarks. It serves as a centralized hub for comparing the performance of various language models across different tasks and datasets. The leaderboard provides a transparent and standardized way to track advancements in language model capabilities.
What does HHEM stand for?
HHEM stands for Human-Human Empirical Metrics, focusing on evaluating language models based on human-like performance benchmarks.
Can I submit my own language model benchmarks?
Yes, HHEM Leaderboard allows users to submit benchmarks for their own language models, provided they follow the submission guidelines and criteria.
How often are the benchmarks updated?
The benchmarks are updated regularly as new models are submitted or as existing models are re-evaluated with updated metrics.