M-RewardBench Leaderboard
NSFW Text Generator for Detecting NSFW Text
Display CLIP benchmark results for inference performance
Predict linear relationships between numbers
https://huggingface.co/spaces/VIDraft/mouse-webgen
Analyze and visualize your dataset using AI
Generate plots for GP and PFN posterior approximations
Generate detailed data profile reports
Browse and compare Indic language LLMs on a leaderboard
Display a Bokeh plot
Embed and use ZeroEval for evaluation tasks
Analyze autism data and generate detailed reports
View and compare pass@k metrics for AI models
M-RewardBench is a data visualization tool designed to display a leaderboard for multilingual reward models. It helps users comparing and evaluating the performance of different models across various languages and tasks.
• Real-Time Updates: Provides up-to-the-minute leaderboard rankings for multilingual reward models. • Customizable Sorting: Users can sort models based on performance metrics like accuracy, F1-score, or other predefined criteria. • Multi-Language Support: Displays results for models trained on multiple languages, enabling cross-lingual performance comparison. • Interactive Visualizations: Offers charts and graphs to visually represent model performance trends. • Benchmark Comparisons: Includes predefined benchmarks for quick evaluation of model performance.
What is the purpose of M-RewardBench?
M-RewardBench is designed to help users compare and evaluate the performance of multilingual reward models across different languages and tasks.
Which languages does M-RewardBench support?
M-RewardBench supports a wide range of languages, including but not limited to English, Spanish, French, German, Chinese, and many others.
Can I customize the performance metrics used in the leaderboard?
Yes, users can customize the performance metrics used for evaluation, such as accuracy, F1-score, or other predefined criteria, to suit their specific needs.