M-RewardBench Leaderboard
View monthly arXiv download trends since 1994
View and compare pass@k metrics for AI models
Monitor application health
Browse and explore datasets from Hugging Face
What happened in open-source AI this year, and what’s next?
Analyze and visualize Hugging Face model download stats
Generate a detailed dataset report
VLMEvalKit Evaluation Results Collection
Browse and compare Indic language LLMs on a leaderboard
Leaderboard for text-to-video generation models
Open Agent Leaderboard
Analyze autism data and generate detailed reports
M-RewardBench is a data visualization tool designed to display a leaderboard for multilingual reward models. It helps users comparing and evaluating the performance of different models across various languages and tasks.
• Real-Time Updates: Provides up-to-the-minute leaderboard rankings for multilingual reward models. • Customizable Sorting: Users can sort models based on performance metrics like accuracy, F1-score, or other predefined criteria. • Multi-Language Support: Displays results for models trained on multiple languages, enabling cross-lingual performance comparison. • Interactive Visualizations: Offers charts and graphs to visually represent model performance trends. • Benchmark Comparisons: Includes predefined benchmarks for quick evaluation of model performance.
What is the purpose of M-RewardBench?
M-RewardBench is designed to help users compare and evaluate the performance of multilingual reward models across different languages and tasks.
Which languages does M-RewardBench support?
M-RewardBench supports a wide range of languages, including but not limited to English, Spanish, French, German, Chinese, and many others.
Can I customize the performance metrics used in the leaderboard?
Yes, users can customize the performance metrics used for evaluation, such as accuracy, F1-score, or other predefined criteria, to suit their specific needs.