Benchmark LLMs in accuracy and translation across languages
Evaluate adversarial robustness using generative models
Browse and submit language model benchmarks
Evaluate reward models for math reasoning
Load AI models and prepare your space
Measure BERT model performance using WASM and WebGPU
Compare LLM performance across benchmarks
Visualize model performance on function calling tasks
Download a TriplaneGaussian model checkpoint
Display LLM benchmark leaderboard and info
View and submit LLM evaluations
Display model benchmark results
Quantize a model for faster inference
The European Leaderboard is a benchmarking tool designed to evaluate and compare large language models (LLMs) in terms of accuracy and translation capabilities across multiple languages. It provides a comprehensive platform to assess model performance, enabling users to identify top-performing models for specific tasks and languages.
What is the main purpose of the European Leaderboard?
The primary purpose is to provide a standardized way to benchmark and compare LLMs across various European languages and tasks.
Which languages are supported by the European Leaderboard?
The tool supports a wide range of European languages, including English, French, German, Spanish, Italian, and many others. The exact list is updated regularly.
Can I benchmark my own model using European Leaderboard?
Yes, the platform allows users to submit and benchmark their own models, provided they meet the specified requirements.