Generate and view leaderboard for LLM evaluations
Evaluate reward models for math reasoning
SolidityBench Leaderboard
Leaderboard of information retrieval models in French
Evaluate RAG systems with visual analytics
Persian Text Embedding Benchmark
Retrain models for new data at edge devices
View LLM Performance Leaderboard
Measure over-refusal in LLMs using OR-Bench
Display genomic embedding leaderboard
Convert Hugging Face models to OpenVINO format
Measure BERT model performance using WASM and WebGPU
Display benchmark results
The Arabic MMMLU Leaderborad is a platform designed to evaluate and compare the performance of large language models (LLMs) specifically for the Arabic language. It provides a comprehensive leaderboard that ranks models based on their performance across various tasks and metrics, offering insights into their capabilities and limitations.
What is the purpose of the Arabic MMMLU Leaderborad?
The platform aims to provide a standardized way to evaluate and compare Arabic language models, helping researchers and developers identify top-performing models for specific tasks.
How are models ranked on the leaderboard?
Models are ranked based on their performance across a variety of tasks and datasets. Rankings are updated regularly as new evaluations are conducted.
Can I submit my own model for evaluation?
Yes, the platform allows submissions from researchers and developers. Check the submission guidelines for requirements and instructions.