Explore and submit models using the LLM Leaderboard
View and submit LLM benchmark evaluations
Merge Lora adapters with a base model
Measure over-refusal in LLMs using OR-Bench
Run benchmarks on prediction models
Generate leaderboard comparing DNA models
Analyze model errors with interactive pages
Merge machine learning models using a YAML configuration file
Export Hugging Face models to ONNX
Convert Stable Diffusion checkpoint to Diffusers and open a PR
SolidityBench Leaderboard
Evaluate adversarial robustness using generative models
Launch web-based model application
OPEN-MOE-LLM-LEADERBOARD is a platform designed for exploring and submitting large language models (LLMs). It serves as a centralized hub where users can compare and evaluate different LLMs based on various benchmarks and metrics. The leaderboard provides transparent and comprehensive insights into the performance of different models, helping researchers and developers make informed decisions.
• Model Benchmarking: Compare LLMs across multiple tasks and datasets to understand their strengths and weaknesses.
• Model Submission: Submit your own LLM for evaluation and inclusion in the leaderboard.
• Interactive Visualization: Explore detailed performance metrics and visualizations to gain deeper insights.
• Community-Driven: Open for contributions and feedback from the AI research community.
1. What is the purpose of OPEN-MOE-LLM-LEADERBOARD?
The leaderboard aims to provide a transparent and standardized platform for comparing and evaluating large language models. It helps users identify the best models for their specific needs.
2. How do I submit my own model to the leaderboard?
To submit your model, follow the submission guidelines provided on the platform. This typically involves providing model weights, configuration details, and benchmarking results.
3. What metrics are used to evaluate models on the leaderboard?
Models are evaluated based on a variety of metrics, including accuracy, inference speed, parameter efficiency, and performance on specific benchmarks. The exact metrics may vary depending on the task or dataset.