M-RewardBench Leaderboard
Browse and filter AI model evaluation results
Profile a dataset and publish the report on Hugging Face
Finance chatbot using vectara-agentic
Generate detailed data profile reports
Generate plots for GP and PFN posterior approximations
Migrate datasets from GitHub or Kaggle to Hugging Face Hub
Classify breast cancer risk based on cell features
Explore income data with an interactive visualization tool
Explore speech recognition model performance
Open Agent Leaderboard
Uncensored General Intelligence Leaderboard
Explore and compare LLM models through interactive leaderboards and submissions
M-RewardBench is a data visualization tool designed to display a leaderboard for multilingual reward models. It helps users comparing and evaluating the performance of different models across various languages and tasks.
• Real-Time Updates: Provides up-to-the-minute leaderboard rankings for multilingual reward models. • Customizable Sorting: Users can sort models based on performance metrics like accuracy, F1-score, or other predefined criteria. • Multi-Language Support: Displays results for models trained on multiple languages, enabling cross-lingual performance comparison. • Interactive Visualizations: Offers charts and graphs to visually represent model performance trends. • Benchmark Comparisons: Includes predefined benchmarks for quick evaluation of model performance.
What is the purpose of M-RewardBench?
M-RewardBench is designed to help users compare and evaluate the performance of multilingual reward models across different languages and tasks.
Which languages does M-RewardBench support?
M-RewardBench supports a wide range of languages, including but not limited to English, Spanish, French, German, Chinese, and many others.
Can I customize the performance metrics used in the leaderboard?
Yes, users can customize the performance metrics used for evaluation, such as accuracy, F1-score, or other predefined criteria, to suit their specific needs.